Hotelling's T-square distribution

In statistics, Hotelling's T-square statistic, named for Harold Hotelling, is a generalization of Student's t statistic that is used in multivariate hypothesis testing.

Hotelling's T-square statistic is defined as

t^2=n({\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}({\mathbf x}-{\mathbf\mu}) $$ where n is a number of points (see below), $${\mathbf x}$$ is a column vector of $$p$$ elements and $${\mathbf W}$$ is a $$p\times p$$ matrix.

If $$x\sim N_p(\mu,{\mathbf V})$$ is a random variable with a multivariate Gaussian distribution and $${\mathbf W}\sim W_p(m,{\mathbf V})$$ (independent of x) has a Wishart distribution with the same non-singular variance matrix $$\mathbf V$$ and with $$m=n-1$$, then the distribution of $$t^2$$ is $$T^2(p,m)$$, Hotelling's T-square distribution with parameters p and m. It can be shown that



\frac{m-p+1}{pm} T^2\sim F_{p,m-p+1} $$ where $$F$$ is the F-distribution.

Now suppose that


 * $${\mathbf x}_1,\dots,{\mathbf x}_n$$

are p&times;1 column vectors whose entries are real numbers. Let


 * $$\overline{\mathbf x}=(\mathbf{x}_1+\cdots+\mathbf{x}_n)/n$$

be their mean. Let the p&times;p positive-definite matrix


 * $${\mathbf W}=\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'/(n-1)$$

be their "sample variance" matrix. (The transpose of any matrix M is denoted above by M&prime;). Let μ be some known p&times;1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is



t^2=n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}). $$

Note that $$t^2$$ is closely related to the squared Mahalanobis distance.

In particular, it can be shown that if $${\mathbf x}_1,\dots,{\mathbf x}_n\sim N_p(\mu,{\mathbf V})$$, are independent, and $$\overline{\mathbf x}$$ and $${\mathbf W}$$ are as defined above then $${\mathbf W}$$ has a Wishart distribution with n &minus; 1 degrees of freedom


 * $$\mathbf{W} \sim W_p(V,n-1)$$.

and is independent of $$\overline{\mathbf x}$$, and


 * $$\overline{\mathbf x}\sim N_p(\mu,V/n)$$

This implies that:


 * $$t^2 = n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}) \sim T^2(p, n-1).$$

Hotelling's two-sample T-square statistic
If $${\mathbf x}_1,\dots,{\mathbf x}_{n_x}\sim N_p(\mu,{\mathbf V})$$ and $${\mathbf y}_1,\dots,{\mathbf y}_{n_y}\sim N_p(\mu,{\mathbf V})$$, with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define


 * $$\overline{\mathbf x}=\frac{1}{n_x}\sum_{i=1}^{n_x} \mathbf{x}_i \qquad \overline{\mathbf y}=\frac{1}{n_y}\sum_{i=1}^{n_y} \mathbf{y}_i$$

as the sample means, and
 * $${\mathbf W}= \frac{\sum_{i=1}^{n_x}(\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'

+\sum_{i=1}^{n_y}(\mathbf{y}_i-\overline{\mathbf y})(\mathbf{y}_i-\overline{\mathbf y})'}{n_x+n_y-2}$$ as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is


 * $$t^2 = \frac{n_x n_y}{n_x+n_y}(\overline{\mathbf x}-\overline{\mathbf y})'{\mathbf W}^{-1}(\overline{\mathbf x}-\overline{\mathbf y})

\sim T^2(p, n_x+n_y-2)$$

and it can be related to the F-distribution by


 * $$\frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 \sim F(p,n_x+n_y-1-p).$$