Yates' correction for continuity

In statistics, Yates' correction for continuity, or Yates' chi-square test is used in certain situations when testing for independence in a contingency table. It is required as a chi-square test has the assumption that the discrete probability of observed frequencies can be approximated by the chi-squared distribution, which is continuous.

To overcome this, Frank Yates, an English statistician suggested a correction for continuity which adjusts the formula for Pearson's chi-square test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 &times; 2 contingency table. This reduces the chi-square value obtained and thus increases its p-value. It prevents overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected frequency less than 5. Unfortunately, Yates' correction may tend to overcorrect. This can result in an overly conservative result that fails to reject the null hypothesis when it should.


 * $$ \chi_\mathrm{Yates}^2 = \sum_{i=1}^{N} {(|O_i - E_i| - 0.5)^2 \over E_i}$$

where:


 * Oi = an observed frequency
 * Ei = an expected (theoretical) frequency, asserted by the null hypothesis
 * N = number of distinct events

As a short-cut, for a 2x2 table with the following entries:

we can write


 * $$\chi_\mathrm{Yates}^2 = \frac{N(|ad - bc| - N/2)^2}{N_S N_F N_A N_B}.$$

Other sources say that this correction should be used when the expected frequency is less than 10.

Yet other sources say that Yates corrections should always be applied. However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value obtained.