Empirical process


 * For the process control topic, see Empirical process (process control model).

The study of empirical processes is a branch of mathematical statistics and a sub-area of probability theory. It is a generalization of the central limit theorem for the empirical measures.

Definition
It is known that under certain conditions empirical measures $$P_n$$ uniformly converge to the probability measure P (see Glivenko-Cantelli theorem). Empirical processes provide rate of this convergence.

A centered and scaled version of the empirical measure is the signed measure
 * $$G_n(A)=\sqrt{n}(P_n(A)-P(A))$$

It induces map on measurable functions f given by


 * $$f\mapsto G_n f=\sqrt{n}(P_n-P)f=\sqrt{n}\left(\frac{1}{n}\sum_{i=1}^n f(X_i)-\mathbb{E}f\right)$$

By the central limit theorem, $$G_n(A)$$ converges in distribution to a normal random variable N(0,P(A)(1-P(A))) for fixed measurable set A. Similarly, for a fixed function f, $$G_nf$$ converges in distribution to a normal random variable $$N(0,\mathbb{E}(f-\mathbb{E}f)^2)$$, provided that $$\mathbb{E}f$$ and $$\mathbb{E}f^2$$ exist.

Definition
 * $$\bigl(G_n(c)\bigr)_{c\in\mathcal{C}}$$ is called empirical process indexed by $$\mathcal{C}$$, a collection of measurable subsets of S.
 * $$\bigl(G_nf\bigr)_{f\in\mathcal{F}}$$ is called empirical process indexed by $$\mathcal{F}$$, a collection of measurable functions from S to $$\mathbb{R}$$.

A significant result in the area of empirical processes is Donsker's theorem. It has led to a study of the Donsker classes such that empirical processes indexed by these classes converge weakly to a certain Gaussian process. It can be shown that the Donsker classes are Glivenko-Cantelli, the converse is not true in general.

Example
As an example, consider empirical distribution functions. For real-valued iid random variables $$X_1,X_n,...$$ they are given by


 * $$F_n(x)=P_n((-\infty,x])=P_nI_{(-\infty,x]}.$$

In this case, empirical processes are indexed by a class $$\mathcal{C}=\{(-\infty,x]:x\in\mathbb{R}\}.$$ It has been shown that $$\mathcal{C}$$ is a Donsker class, in particular,
 * $$\sqrt{n}(F_n(x)-F(x))$$ converges weakly in $$\ell^\infty(\mathbb{R})$$ to a Brownian bridge B(F(x)).