Wishart distribution

In statistics, the Wishart distribution, named in honor of John Wishart, is any of a family of probability distributions for nonnegative-definite matrix-valued random variables ("random matrices"). These distributions are of great importance in the estimation of covariance matrices in multivariate statistics.

Definition
Suppose X is an n &times; p matrix, each row of which is independently drawn from  p-variate normal distribution with zero mean:


 * $$X_{(i)}{=}(x_i^1,\dots,x_i^p)^T\sim N_p(0,V),$$

Then the Wishart distribution is the probability distribution of the p&times;p random matrix


 * $$S={X}^{T}{X}, \,\!$$

where S is known as the scatter matrix. One indicates that S has that probability distribution by writing


 * $$S\sim W_p(V,n).$$

The positive integer n is the number of degrees of freedom. Sometimes this is written W(V, p, n).

If p = 1 and V = 1 then this distribution is a chi-square distribution with n degrees of freedom.

Occurrence
The Wishart distribution arises frequently in likelihood-ratio tests in multivariate statistical analysis. It also arises in the spectral theory of random matrices.

Probability density function
The Wishart distribution can be characterized by its probability density function, as follows.

Let W be a p &times; p symmetric matrix of random variables that is positive definite. Let V be a (fixed) positive definite matrix of size p &times; p.

Then, if n &ge; p, then W has a Wishart distribution with n degrees of freedom if it has a probability density function fW given by



f_{\mathbf W}(w)= \frac{ \left|w\right|^{(n-p-1)/2} \exp\left[ - {\rm trace}({\mathbf V}^{-1}w/2 )\right] }{ 2^{np/2}\left|{\mathbf V}\right|^{n/2}\Gamma_p(n/2) } $$

where &Gamma;p(&middot;) is the multivariate gamma function defined as



\Gamma_p(n/2)= \pi^{p(p-1)/4}\Pi_{j=1}^p \Gamma\left[ (n+1-j)/2\right]. $$

In fact the above definition can be extended to any real n > p &minus; 1.

Characteristic function
The characteristic function of the Wishart distribution is



\Theta \mapsto \left|{\mathbf I} - 2i\,{\mathbf\Theta}{\mathbf V}\right|^{-n/2}. $$

In other words,


 * $$\Theta \mapsto {\mathcal E}\left\{\mathrm{exp}\left[i\cdot\mathrm{trace}({\mathbf W}{\mathbf\Theta})\right]\right\}

= \left|{\mathbf I} - 2i{\mathbf\Theta}{\mathbf V}\right|^{-n/2} $$

where $${\mathcal E}(\cdot)$$ denotes expectation.

Theorem
If $$\scriptstyle {\mathbf W}$$ has a Wishart distribution with m degrees of freedom and variance matrix $$\scriptstyle {\mathbf V}$$&mdash;write $$\scriptstyle {\mathbf W}\sim{\mathbf W}_p({\mathbf V},m)$$&mdash;and $$\scriptstyle{\mathbf C}$$ is a q &times; p matrix of rank q, then



{\mathbf C}{\mathbf W}{\mathbf C'} \sim {\mathbf W}_q\left({\mathbf C}{\mathbf V}{\mathbf C'},m\right). $$

Corollary 1
If $${\mathbf z}$$ is a nonzero $$p\times 1$$ constant vector, then $${\mathbf z'}{\mathbf W}{\mathbf z}\sim\sigma_z^2\chi_m^2$$.

In this case, $$\chi_m^2$$ is the chi-square distribution and $$\sigma_z^2={\mathbf z'}{\mathbf V}{\mathbf z}$$ (note that $$\sigma_z^2$$ is a constant; it is positive because $${\mathbf V}$$ is positive definite).

Corollary 2
Consider the case where $${\mathbf z'}=(0,\ldots,0,1,0,\ldots,0)$$ (that is, the j-th element is one and all others zero). Then corollary 1 above shows that



w_{jj}\sim\sigma_{jj}\chi^2_m$$

gives the marginal distribution of each of the elements on the matrix's diagonal.

Noted statistician George Seber points out that the Wishart distribution is not called the "multivariate chi-square distribution" because the marginal distribution of the off-diagonal elements is not chi-square. Seber prefers to reserve the term multivariate for the case when all univariate marginals belong to the same family.

Estimator of the multivariate normal distribution
The Wishart distribution is the probability distribution of the maximum-likelihood estimator (MLE) of the covariance matrix of a multivariate normal distribution. The derivation of the MLE is perhaps surprisingly subtle and elegant. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1&times;1 matrix than as a mere scalar. See estimation of covariance matrices.

Drawing values from the distribution
The following procedure is due to Smith &amp; Hocking. One can sample random p &times; p matrices from a p-variate Wishart distribution with scale matrix $${\textbf V}$$ and n degrees of freedom (for $$n \geq p$$) as follows:


 * 1) Generate a random p &times; p lower triangular matrix $${\textbf A}$$ such that:
 * 2) * $$a_{ii}=(\chi^2_{n-i+1})^{1/2}$$, i.e. $$a_{ii}$$ is the square root of a sample taken from a chi-square distribution $$\chi^2_{n-i+1}$$
 * 3) * $$a_{ij}$$, for $$j<i$$, is sampled from a standard normal distribution $$N_1(0,1)$$
 * 4) Compute the Cholesky decomposition of $${\textbf V} = {\textbf L}{\textbf L}^T$$.
 * 5) Compute the matrix $${\textbf X} = {\textbf L}{\textbf A}{\textbf A}^T{\textbf L}^T$$. At this point, $${\textbf X}$$ is a sample from the Wishart distribution $$W_p({\textbf V},n)$$.

Note that if $${\textbf V}={\textbf I}$$, the identity matrix, then the sample can be directly obtained from $${\textbf X} = {\textbf A}{\textbf A}^T$$ since the Cholesky decomposition of $${\textbf V}={\textbf I}{\textbf I}^T$$.