Scatter matrix

In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix of the multivariate normal distribution. (The scatter matrix is unrelated to the scattering matrix of quantum mechanics.)

Definition
Given n samples of m-dimensional data, represented as the m-by-n matrix, $$X=[\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n]$$, the sample mean is


 * $$\overline\mathbf{x} = \frac{1}{n}\sum_{j=1}^n \mathbf{x}_j$$

where $$\mathbf{x}_j$$ is the jth column of $$X\,$$.

The scatter matrix is the m-by-m positive semi-definite matrix


 * $$S = \sum_{j=1}^n (\mathbf{x}_j-\overline\mathbf{x})(\mathbf{x}_j-\overline\mathbf{x})'$$

where $${\,}'$$ denotes matrix transpose. The scatter matrix may be expressed more succinctly as


 * $$S = X\,C_n\,X\,'$$

where $$\,C_n$$ is the n-by-n centering matrix.

Application
The maximum likelihood estimate, given n samples, for the covariance matrix of a multivariate normal distribution can be expressed as the normalized scatter matrix
 * $$C_{ML}=\frac{1}{n}S.$$

When the columns of $$X\,$$ are independently sampled from a multivariate normal distribution, then $$S\,$$ has a Wishart distribution.