True variance

In statistics, the term true variance is often used to refer to the unobservable variance of a whole finite population, as distinguished from an observable statistic based on a sample. Suppose a number, such as a person's height or income or age or cholesterol level, is assigned to every member of a population of n individuals. Let xi be the number assigned to the ith individual, for i = 1, ..., n. Then the variance is


 * $$\sigma^2={1 \over n}\sum_{i=1}^n (x_i-\overline{x})^2,\quad\quad\quad(1)$$

where


 * $$\mu=\overline{x}={x_1+\cdots+x_n \over n}$$

is the population mean. If xi were the ith member of a random sample rather than of the whole population, then one sometimes uses the same function seen in (1) above as an estimate of the "true variance" or "population variance" &sigma;2. But sometimes one replaces n with n &minus; 1, or n + 1 or otherwise alters the expression (1), in order to estimate &sigma;2. In particular, using n &minus; 1 makes the estimator unbiased, and in some often considered contexts, using n + 1 minimizes the mean squared error of estimation.

Statisticians do not normally use Greek letters &mu; and &sigma; for estimates based on samples, but only for (often) unobservable characteristics of whole populations. Because the "true" or "population" variance uses the denominator 1/n rather than 1/(n &minus; 1), it is conventional among those concerned with computation sometimes to call the expression (1), with the denominator 1/n, the "true variance" without regard to whether it is an estimate or a characteristic of whole population or a random sample.