Empirical distribution function

In statistics, an empirical distribution function is a cumulative probability distribution function that concentrates probability 1/n at each of the n numbers in a sample.

Let $$X_1,\ldots,X_n$$ be iid random variables in $$\mathbb{R}$$ with the cdf  F(x).

The empirical distribution function $$ F_n(x) $$ based on sample $$ X_1,\ldots,X_n$$ is a step function defined by


 * $$F_n(x) = \frac{ \mbox{number of elements in the sample} \leq x}n =

\frac{1}{n} \sum_{i=1}^n I(X_i \le x),$$

where I(A) is the indicator of event A.

For fixed x, $$I(X_i\leq x)$$ is a Bernoulli random variable with parameter p=F(x), hence $$nF_n(x)$$ is a binomial random variable with mean nF(x) and variance nF(x)(1-F(x)).

Asymptotical properties

 * By the strong law of large numbers,
 * $$F_n(x)\to F(x)$$ almost surely for fixed x.
 * In other words, $$F_n(x)$$ is consistent unbiased estimator of the cumulative distribution function F(x).


 * By the central limit theorem,
 * $$\sqrt{n}(F_n(x)-F(x))$$ converges in distribution to a normal distribution N(0,F(x)(1-F(x))) for fixed x.
 * The Berry–Esséen theorem provides the rate of this convergence.


 * By the Glivenko-Cantelli theorem $$F_n(x)\to F(x)$$ uniformly over x, that is
 * $$\|F_n(x)-F(x)\|_\infty\to 0$$ with probability 1.
 * The Dvoretzky-Kiefer-Wolfowitz inequality provides the rate of this convergence.


 * Kolmogorov showed that
 * $$\sqrt{n}\|F_n(x)-F(x)\|_\infty$$ converges in distribution to the Kolmogorov distribution, provided that F(x) is continuous.
 * The Kolmogorov-Smirnov test for goodness-of-fit is based on this fact.


 * By Donsker's theorem,
 * $$\sqrt{n}(F_n-F)$$, as a process indexed by x, converges weakly in $$\ell^\infty(\mathbb{R})$$ to a Brownian bridge B(F(x)).