Point-biserial correlation coefficient

The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable (e.g. Y) is dichotomous; Y can either be 'naturally' dichotomous, like gender, or an artificially dichotomized variable. In most situations it is not advisable to artificially dichotomize variables.

The point-biserial correlation is mathematically equivalent to the Pearson (product moment) correlation, that is, if we have one continuously measured variable X and a dichotomous variable Y, rXY = rpb. This can be shown by assigning two distinct numerical values to the dichotomous variable.

To calculate rpb, assume that the dichotomous variable Y has the two values 0 and 1. If we divide the data set into two groups, group 1 which received the value "1" on Y and group 2 which received the value "0" on Y, then the point-biserial correlation coefficient is calculated as follows:



r_{pb} = \frac{M_1 - M_0}{s_x} \sqrt{ \frac{n_1 n_0}{n(n-1)}}, $$

where $$M_1$$ is the mean value on the continuous variable X for all data points in group 1 and $$M_0$$ is the mean value on the continuous variable X for all data points in group 2. Further, $$n_1$$ is the number of data points in group 1, $$n_0$$ is the number of data points in group 2 and $$n$$ is the total sample size. This formula is a computational formula that has been derived from the formula for rXY in order to reduce steps in the calculation - it is easier to compute than rXY. It is of much less importance these days since computers are almost exclusively used for statistical data analysis.

An incorrect formula with $$n*n$$ instead of $$n*(n-1)$$ in the denominator of the square root can be found widely on the internet as well as in the literature, for example in the book Applied multiple regression/correlation analysis for the behavioral sciences. Glass and Hopkins' book Statistical Methods in Education and Psychology, (3rd Edition) contains the correct formula.