Bayesian multivariate linear regression

Consider a collection of m linear regression problems for n observations, related through a set of common predictor variables $$\{x_{c}\}$$, and a jointly normal errors $$\{\epsilon_{c}\}$$ :


 * $$y_{1} = \beta_{i} x_{1} + \epsilon_{i},\,$$


 * $$y_{c} = \beta_{c} x_{c} + \epsilon_{c},\,$$


 * $$y_{m} = \beta_{c} x_{m} + \epsilon_{m},\,$$

where the subscript c denotes a column vector of k observations for each measurement ($$n = k + m$$).

The noise terms are jointly normal over each collection of k observations. That is, each row vector $$\{r\}$$ represents an m length vector of correlated observations on each of the dependent variables:


 * $$y_{r} = B^{T}x_{r} + \epsilon_{r},\,$$

where the noise $$\epsilon_{r}$$ is i.i.d. and normally distributed for all rows $$\{r\}$$.


 * $$\epsilon_{r} \sim N(0, \Sigma_{\epsilon}^2).\,$$

where B is an $$k \times m$$ matrix


 * $$B = [\beta_{1},\cdots,\beta_{c},\cdots, \beta_{m}]\,$$

We can write the entire regression problem in matrix form as:


 * $$Y =B^{T}X + E,\,$$

where Y and E are $$n \times m$$ matrices.

The classical, frequentists linear least squares solution is to simply estimate the matrix of regression coefficients $$\hat{B}$$ using the Moore-Penrose pseudoinverse:


 * $$ \hat{B} = (X^{T}X)^{-1}X^{T}Y$$.

To obtain the Bayesian solution, we need to specify the conditional likelihood and then find the appropriate conjugate prior. As with the univerate case of linear Bayesian regression, we will find that we can specify a natural conditional conjugate prior (which is scale dependent).

Let us write our conditional likelihood as


 * $$\rho(E|\Sigma_{\epsilon}) \propto (\Sigma_{\epsilon}^{2})^{-n/2} exp(-\frac{1}{2} tr(E^{T} \Sigma_{\epsilon}^{-1}E) ) ,\,$$

writing the error E in terms Y,X, and B yields


 * $$\rho(Y|X,B\Sigma_{\epsilon}) \propto (\Sigma_{\epsilon}^{2})^{-n/2} exp(-\frac{1}{2} tr((Y-BX)^{T} \Sigma_{\epsilon}^{-1}(Y-BX)) ) ,\,$$

We seek a natural conjugate prior—a joint density $$\rho(B,\sigma_{\epsilon})$$ which is of the same functional form as the likelihood. Since the likelihood is quadratic in $$B$$, we re-write the likelihood so it is normal in $$(B-\hat{B})$$ (the deviation from classical sample estimate)

Using the same technique as with linear Bayesian regression, we decompose the exponential term using a matrix-form of the sum-of-squares technique. Here, however, we will also need to use the Matrix Differential Calculus (Kronecker product and vectorization transformations).

First, let us apply sum-of-squares to obtain new expression for the likelihood:


 * $$\rho(Y|X,B,\Sigma_{\epsilon}) \propto \Sigma_{\epsilon}^{-(n-k)/2} exp(-tr(-\frac{1}{2}S\Sigma_{\epsilon}^{-1}))

(\Sigma_{\epsilon}^{2})^{-k/2} exp(-\frac{1}{2} tr((B-\hat{B})^{T} \Sigma_{\epsilon}^{-1}(B-\hat{B})) ) ,\,$$


 * $$S = Y - \hat{B}X$$

We would like to develop a conditional form for the priors:


 * $$\rho(B,\Sigma_{\epsilon}) = \rho(\Sigma_{\epsilon})\rho(B|\Sigma_{\epsilon}),\,$$

where $$\rho(\Sigma_{\epsilon})$$ is an inverse-Wishart distribution and $$\rho(B|\Sigma_{\epsilon})$$ is some form of normal distribution in the matrix $$B$$. This is accomplished using the vectorization transformation, which converts the likelihood from a function of the matrices $$B, \hat{B}$$ to a function of the vectors $$\Beta = vec(B), \hat{\Beta} = vec(\hat{B})$$.

Write


 * $$tr((B - \hat{B})^{T}X^{T} \Sigma_{\epsilon}^{-1} X(B - \hat{B})) = vec(B - \hat{B})vec(X^{T} \Sigma_{\epsilon}^{-1} X(B - \hat{B}))$$

Let


 * $$ vec(X^{T} \Sigma_{\epsilon}^{-1} X(B - \hat{B})) = (\Sigma_{\epsilon}^{-1} \otimes X^{T}X )vec(B - \hat{B}) $$

Then


 * $$tr((B - \hat{B})^{T}X^{T} (\Sigma_{\epsilon}^{-1} \otimes X^{T}X )vec(B - \hat{B})$$


 * $$ = (\beta-\hat{\beta})(\Sigma_{\epsilon}^{-1} \otimes X^{T}X )(\beta-\hat{\beta})$$

which will lead to a likelihood which is normal in $$(\beta - \bar{\beta})$$.

With the likelihood in a more tractable form, we can now find a natural (conditional) conjugate prior.

(to complete)

Example: