Unbiased estimation of standard deviation

In statistics, the standard deviation is often estimated from a random sample drawn from the population. The most common measure used is the sample standard deviation, which is defined by

s = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \overline{x})^2}\,, $$ where $$\{x_1,x_2,\ldots,x_n\}$$ is the sample (formally, realizations from a random variable X) and $$\overline{x}$$ is the sample mean.

The reason for this definition is that s2 is an unbiased estimator for the variance &sigma;2 of the underlying population, if that variance exists and the sample values are drawn independently with replacement. However, s estimates the population standard deviation &sigma; with negative bias; that is, s tends to underestimate &sigma;.

An explanation why the square root of the sample variance is a biased estimator of the standard deviation is that the square root is a nonlinear function, and only linear functions commute with taking the mean. Since the square root is a concave function, it follows from Jensen's inequality that the square root of the sample variance is an underestimate.

Bias correction
When the random variable is normally distributed, a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed X, Cochran's theorem implies that $$\sqrt{n{-}1}\,s/\sigma$$ has a chi distribution with $$n-1$$ degrees of freedom. Consequently,
 * $$\operatorname{E}[s] = c_4\sigma$$

where $$c_4$$ is a constant that depends on the sample size n as follows:
 * $$c_4=\sqrt{\frac{2}{n-1}}\frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)}

= 1 - \frac{1}{4n} - \frac{7}{32n^2} - O(n^{-3})$$ and $$\Gamma(\cdot)$$ is the gamma function.

Thus an unbiased estimator of &sigma; is had by dividing s by $$c_4$$. Tables giving the value of $$c_4$$ for selected values of n may be found in most textbooks on statistical quality control. As n grows large it approaches 1, and even for smaller values the correction is minor. For example, for $$n=10$$ the value of $$c_4$$ is about 0.9727. It is important to keep in mind this correction only produces an unbiased estimator for normally distributed X. When this condition is satisfied, another result about s involving $$c_4$$ is that the standard deviation of s is $$\sigma\sqrt{1-c_4^2}$$.