James-Stein estimator

The James-Stein estimator is a nonlinear estimator which can be shown to dominate, or outperform, the "ordinary" (least squares) technique. As such, it is the best-known example of Stein's phenomenon.

An earlier version of the estimator was developed by Charles Stein, and is sometimes referred to as Stein's estimator. The result was improved by James and Stein.

Setting
Suppose $$\boldsymbol \theta$$ is an unknown parameter vector of length $$m$$, and let $$\mathbf y$$ be observations of the parameter vector, such that



{\mathbf y} \sim N({\boldsymbol \theta}, \sigma^2 I).\, $$

We are interested in obtaining an estimate $$\widehat{\boldsymbol \theta} = \widehat{\boldsymbol \theta}({\mathbf y})$$ of $$\boldsymbol \theta$$, based on the observations $$\mathbf y$$.

This is an everyday situation in which a set of parameters is measured, and the measurements are corrupted by independent Gaussian noise. Since the noise has zero mean, it is very reasonable to use the measurements themselves as an estimate of the parameters. This is the approach of the least squares estimator, which simply equals $$\widehat{\boldsymbol \theta}_{LS} = {\mathbf y}$$ in this case.

As a result, there was considerable shock and disbelief when Stein demonstrated that, in terms of mean squared error $$E \{ \| {\boldsymbol \theta}-\widehat {\boldsymbol \theta} \|^2 \}$$, this approach is suboptimal. The result became known as Stein's phenomenon.

The James-Stein estimator
The James-Stein estimator is given by



\widehat{\boldsymbol \theta}_{JS} = \left( 1 - \frac{(m-2) \sigma^2}{\|{\mathbf y}\|^2} \right) {\mathbf y}. $$

James and Stein showed that the above estimator dominates $$\widehat{\boldsymbol \theta}_{LS}$$ for any $$m \ge 3$$, meaning that the James-Stein estimator always achieves lower MSE than the least squares estimator.

Stein has shown that, for $$m \le 2$$, the least squares estimator is admissible, meaning that no estimator dominates it.

Interpretation
A consequence of the above discussion is the following counterintuitive result: When three or more unrelated parameters are measured, their total MSE can be reduced by using a combined estimator such as the James-Stein estimator; whereas when each parameter is estimated separately, the least squares (LS) estimator is admissible. This quirk has caused some to sarcastically ask whether, in order to estimate the speed of light, one should jointly estimate tea consumption in Taiwan and hog weight in Montana. The response is that the James-Stein estimator always improves upon the total MSE, i.e., the sum of the expected errors of each component. Therefore, the total MSE in measuring light speed, tea consumption and hog weight would improve by using the James-Stein estimator. However, any particular component (such as the speed of light) would improve for some parameter values, and deteriorate for others. Thus, although the James-Stein estimator dominates the LS estimator when three or more parameters are estimated, any single component does not dominate the respective component of the LS estimator.

The conclusion from this hypothetical example is that measurements should be combined if one is interested in minimizing their total MSE. For example, in a telecommunication setting, it is reasonable to combine channel tap measurements in a channel estimation scenario, as the goal is to minimize the total channel estimation error. Conversely, it is probably not reasonable to combine channel estimates of different users, since no user would want their channel estimate to deteriorate in order to improve the average network performance.

Extensions
The James-Stein estimator may seem at first sight to be a result of some peculiarity of the problem setting. In fact, the estimator exemplifies a very wide-ranging effect, namely, the fact that the "ordinary" or least squares estimator is often inadmissible for simultaneous estimation of several parameters. This effect has been called Stein's phenomenon, and has been demonstrated for several different problem settings, some of which are briefly outlined below.
 * James and Stein demonstrated that the estimator presented above can still be used when the variance $$\sigma^2$$ is unknown, by replacing it with the standard estimator of the variance, $$\widehat{\sigma}^2 = \frac{1}{n}\sum ( y_i-\overline{y} )^2$$. The dominance result still holds under the same condition, namely, $$m > 2$$.
 * Bock extended the work of James and Stein to the case of a general measurement covariance matrix, i.e., where measurements may be statistically dependent and may have differing variances. A similar dominating estimator can be constructed, with a suitably generalized dominance condition. This can be used to construct a linear regression technique which outperforms the standard application of the LS estimator.
 * Stein's result was substantially extended by Brown to a wide class of distributions and loss functions. However, his theorem is an existence result only, in that explicit dominating estimators were not actually exhibited. It is quite difficult to obtain explicit estimators improving upon the usual estimator without specific restrictions on the underlying distributions.