Centering matrix

In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

Definition
The centering matrix of size n is defined as the n-by-n matrix
 * $$C_n = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}'$$

where $$I_n\,$$ is the identity matrix of size n, $$\mathbf{1}$$ is the column-vector of n ones and where $${\,}'$$ denotes matrix transpose. For example


 * $$C_1 = \begin{bmatrix}

0 \end{bmatrix} ,\ C_2 = \left[ \begin{array}{rrr} \frac{1}{2} & -\frac{1}{2} \\ \\ -\frac{1}{2} & \frac{1}{2} \end{array} \right] ,\ C_3 = \left[ \begin{array}{rrr} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{array} \right] $$

Properties
Given a column-vector, $$\mathbf{v}\,$$ of size n, the centering property of $$C_n\,$$ can be expressed as
 * $$C_n\,\mathbf{v} = \mathbf{v}-(\frac{1}{n}\mathbf{1}'\mathbf{v})\mathbf{1}$$

where $$\frac{1}{n}\mathbf{1}'\mathbf{v}$$ is the mean of the components of $$\mathbf{v}\,$$.

$$C_n\,$$ is symmetric positive semi-definite.

$$C_n\,$$ is idempotent, so that $$C_n^k=C_n$$, for $$k=1,2,\ldots$$. Once you have removed the mean, it is zero and removing it again has no effect.

$$C_n\,$$ is singular. The effects of applying the transformation $$C_n\,\mathbf{v}$$ cannot be reversed.

$$C_n\,$$ has the eigenvalue 1 of multiplicity n-1 and 0 of multiplicity one.

$$C_n\,$$ has a nullspace of dimension 1, along the vector $$\mathbf{1}$$.

$$C_n\,$$ is a projection matrix. That is, $$C_n\mathbf{v}$$ is a projection of $$\mathbf{v}\,$$ onto the (n-1)-dimensional subspace that is orthogonal to the nullspace $$\mathbf{1}$$. (This is the subspace of all n-vectors whose components sum to zero.)

Application
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix $$X\,$$, the multiplication $$C_m\,X$$ removes the means from each of the n columns, while $$X\,C_n$$ removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, $$S=(X-\mu\mathbf{1}')(X-\mu\mathbf{1}')'$$ of a data sample $$X\,$$, where $$\mu=\tfrac{1}{n}X\mathbf{1}$$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
 * $$S=X\,C_n(X\,C_n)'=X\,C_n\,C_n\,X\,'=X\,C_n\,X\,'.$$