Correlation ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample.

Suppose each observation is yxi where x indicates the category that observation is in and i is the label of the particular observation. We will write nx for the number of observations in category x (not necessarily the same for different values of x) and
 * $$\overline{y}_x=\frac{\sum_i y_{xi}}{n_x}$$ and $$\overline{y}=\frac{\sum_x n_x \overline{y}_x}{\sum_x n_x}$$

then the correlation ratio &eta; (eta) is defined so as to satisfy


 * $$\eta^2 = \frac{\sum_x n_x (\overline{y}_x-\overline{y})^2}{\sum_{xi} (y_{xi}-\overline{y})^2}$$

which might be written as


 * $$\frac{{\sigma_{\overline{y}}}^2}{{\sigma_{y}}^2}.$$

It is worth noting that if the relationship between values of $$x \;\ $$ and values of $$\overline{y}_x$$ is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient; if not then the correlation ratio will be larger in magnitude, though still no more than 1 in magnitude. It can therefore be used for judging non-linear relationships.

Example
Suppose there is a distribution of test scores in three topics of: Then the subject averages are 36, 33 and 78, with an overall average of 52.
 * Algebra: 45, 70, 29, 15 and 21 (5 scores)
 * Geometry: 40, 20, 30 and 42 (4 scores)
 * Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).

The sums of squares of the differences from the subject averages are 1952 for Alegbra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:
 * $$5 (36-52)^2 + 4 (33-52)^2 +6 (78-52)^2 = 6780$$

This gives
 * $$\eta^2 = \frac{6780}{9640}=0.7033\ldots$$

suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
 * $$\eta = \sqrt{\frac{6780}{9640}}=0.8386\ldots$$