M-estimator

In statistics, M-estimators are a broad class of statistics which are obtained as the solution to the problem of minimizing certain functions of the data. The process of obtaining an M-estimator is called M-estimation.

Some authors define M-estimators to be the root or roots of a system of equations consisting of certain functions of the data. This class is a subset of the class of minimization solutions. Typically these functions are the derivatives of the functions to be minimized in the broader definition.

Many classical statistics can be shown to be M-estimators. Their main utility, however, is as robust alternatives to classical statistical estimators.

Historical motivation
For a family of probability density functions f parameterized by &theta;, the maximum likelihood estimate of &theta; (which could be vector valued) are computed by maximizing the likelihood function over &theta;. The estimate is


 * $$\widehat{\theta} = \operatorname{argmax}_{\theta}{ \left( \prod_{i=1}^n f(x_i, \theta) \right) }\,\!$$

or, equivalently,


 * $$\widehat{\theta} = \operatorname{argmin}_{\theta}{ \left( -\sum_{i=1}^n \log{( f(x_i, \theta) ) }\right) }.\,\!$$

The performance of maximum likelihood estimators depends heavily on the assumed distribution family of the data being at least approximately true. In particular, maximum likelihood estimators can be inefficient and biased when the data are not from the assumed distribution. Of particular concern is the presence of outliers.

Definition
In 1964, Peter Huber proposed generalizing maximum likelihood estimation to the minimization of


 * $$\sum_{i=1}^n\rho(x_i, \theta),\,\!$$

where &rho; is a function with certain properties (see below). The solutions


 * $$\hat{\theta} = \operatorname{argmin}_{\theta}\left(\sum_{i=1}^n\rho(x_i, \theta)\right) \,\!$$

are called M-estimators ("M" for "maximum likelihood-type" (Huber, 1981, page 43); other types of robust estimator include L-estimators, R-estimators and S-estimators). Maximum likelihood estimators are thus a special case of M-estimators.

The function &rho;, or its derivative, &psi;, can be chosen in such a way to provide the estimator desirable properties (in terms of bias and efficiency) when the data are truly from the assumed distribution, and 'not bad' behaviour when the data are generated from a model that is, in some sense, close to the assumed distribution.

Types of M-estimators
M-estimators are solutions &theta; which minimize


 * $$\sum_{i=1}^n\rho(x_i,\theta).\,\!$$

This minimization can always be done directly. Often it is simpler to differentiate with respect to &theta; and solve for the root of the derivative. When this differentiation is possible, the M-estimator is said to be of &psi;-type. Otherwise, the M-estimator is said to be of &rho;-type.

In most practical cases, the M-estimators are of &psi;-type.

Computation
For many choices of ρ or ψ, no closed form solution exists and an iterative approach to computation is required. It is possible to use standard function optimization algorithms, such as Newton-Raphson. However, in most cases an iteratively re-weighted least squares fitting algorithm can be performed; this is typically the preferred method.

For some choices of ψ, specifically, redescending functions, the solution may not be unique. The issue is particularly relevant in multivariate and regression problems. Thus, some care is needed to ensure that good starting points are chosen. Robust starting points, such as the median as an estimate of location and the median absolute deviation as a univariate estimate of scale, are common.

Distribution
It can be shown that M-estimators are asymptotically normally distributed. As such, Wald-type approaches to constructing confidence intervals and hypothesis tests can be used. However, since the theory is asymptotic, it will frequently be sensible to check the distribution, perhaps by examining the permutation or bootstrap distribution.

Influence function
The influence function of an M-estimator of $$\psi$$-type is proportional to its defining $$\psi$$ function.

Let T be an M-estimator of ψ-type, and G be a probability distribution for which $$T(G)$$ is defined. Its influence function IF is


 * $$\operatorname{IF}(x;T,G) = -\frac{\psi(x,T(G))}

{\int\left[\frac{\partial\psi(y,\theta)} {\partial\theta} \right] \mathrm{d}y } $$

A proof of this property of M-estimators can be found in Huber (1981, Section 3.2).

Applications
M-estimators can be constructed for location parameters and scale parameters in univariate and multivariate settings, as well as being used in robust regression.

Mean
Let (X1, ..., Xn) be a set of independent, identically distributed random variables, with distribution F.

If we define


 * $$\rho(x, \theta)=\frac{(x - \theta)^2}{2},\,\!$$

we note that this is minimized when &theta; is the mean of the Xs.  Thus the mean is an M-estimator of &rho;-type, with this &rho; function.

As this &rho; function is continuously differentiable in &theta;, the mean is thus also an M-estimator of &psi;-type for &psi;(x, &theta;) = &theta; - x.