Multiple correlation

In statistics, regression analysis is a method for explanation of phenomena and prediction of future events. In the regression analysis, a coefficient of correlation r between random variables X and Y is a quantitative index of association between these two variables. In its squared form, as a coefficient of determination r2, indicates the amount of variance in the criterion variable Y that is accounted for by the variation in the predictor variable X. In the multiple regression analysis, the set of predictor variables X1, X2, ... is used to explain variability of the criterion variable Y. A multivariate counterpart of the coefficient of determination r2 is the coefficient of multiple determination, R2. The square root of the coefficient of multiple determination is the coefficient of multiple correlation, R.

Conceptualization of multiple correlation
An intuitive approach to the multiple regression analysis is to sum the squared correlations between the predictor variables and the criterion variable to obtain an index of the over-all relationship between the predictor variables and the criterion variable. However, such a sum is often greater than one, suggesting that simple summation of the squared coefficients of correlations is not a correct procedure to employ. In fact, a simple summation of squared coefficients of correlations between the predictor variables and the criterion variable is the correct procedure, but only in the special case when the predictor variables are not correlated. If the predictors are related, their inter-correlations must be removed so that only the unique contributions of each predictor toward explanation of the criterion are included.

Fundamental equation of multiple regression analysis
Initially, a matrix of correlations R is computed for all variables involved in the analysis. This matrix can be conceptualized as a supermatrix, consisting of the vector of cross-correlations between the predictor variables and the criterion variable c, its transpose c’ and the matrix of intercorrelations between predictor variables Rxx. The fundamental equation of the multiple regression analysis is


 * R2 = c' Rxx&minus;1 c.

The expression on the left side signifies the coefficient of multiple determination (squared coefficient of multiple correlation). The expressions on the right side are the transposed vector of cross-correlations c', the matrix of inter-correlations Rxx to be inverted (cf., matrix inversion), and the vector of cross-correlations, c. The premultiplication of the vector of cross-correlations by its transpose changes the coefficients of correlation into coefficients of determination. The inverted matrix of the inter-correlations removes the redundant variance from the of inter-correlations of the predictor set of variables. These not-redundant cross-correlations are summed to obtain the multiple coefficient of determination R2. The square root of this coefficient is the coefficient of multiple correlation R.