Pearson product-moment correlation coefficient

In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y measured on the same object or organism, that is, a measure of the tendency of the variables to increase or decrease together. It is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom:


 * $$ r = \frac {\sum z_x z_y}{n - 1}.$$

Note that this formula assumes the Z scores are calculated using standard deviations which are calculated using n &minus; 1 in the denominator.

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.

The coefficient ranges from &minus;1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of &minus;1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.

The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.

The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.

Any value of Y can therefore be defined as the sum of Y&prime; and the difference between Y and Y&prime;:


 * $$Y = Y^\prime + (Y - Y^\prime).$$

The variance of Y is equal to the sum of the variance of the two components of Y:


 * $$s_y^2 = S_{y^\prime}^2 + s^2_{y.x}.$$

Since the coefficient of determination implies that sy.x2 = sy2(1 &minus; r2) we can derive the identity


 * $$r^2 = {s_{y^\prime}^2 \over s_y^2}.$$

The square of r is conventionally used as a measure of the association between X and Y. For example, if the coefficient is 0.90, then 81% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.

In computer software

 * The  function in many major spreadsheet packages, such as Microsoft Excel, OpenOffice.org Calc and Gnumeric calculates Pearson's correlation coefficient.  Note that versions of Excel prior to 2003 exhibited rounding errors in this function and others.
 * The  function in Microsoft Excel also calculates Pearson's correlation coefficient.
 * In MATLAB and Minitab,  calculates Pearsons correlation coefficient along with p-value.
 * In MATLAB, scilab, and GNU Octave  calculates Pearsons correlation coefficient.
 * In S-Plus and R,  calculates Pearson's correlation coefficient.
 * R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.


 * In IDL, the CORRELATE function computes the PMCC.