False discovery rate

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. It controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses. It is a less conservative comparison procedure with greater power than familywise error rate (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.

Classification of m hypothesis tests
The following table defines some random variables related to the m hypothesis tests.


 * $$m_0$$ is the number of true null hypotheses
 * $$m - m_0$$ is the number of false null hypotheses
 * $$U$$ is the number of true negatives
 * $$V$$ is the number of false positives
 * $$T$$ is the number of false negatives
 * $$S$$ is the number of true positives
 * $$H_1 ... H_m$$ the null hypotheses being tested
 * In m hypothesis tests of which m0 are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.

The false discovery rate is given by $$\mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ]$$ and one wants to keep this value below a threshold $$\alpha$$.

Independent tests
The Simes procedure ensures that its expected value $$\mathrm{E}\!\left[ \frac{V}{V + S} \right]\,$$ is less than a given $$\alpha$$ (Benjamini and Hochberg 1995). This procedure is valid when the $$m$$ tests are independent. Let $$H_1 \ldots H_m$$ be the null hypotheses and $$P_1 \ldots P_m$$ their corresponding p-values. Order these values in increasing order and denote them by $$P_{(1)} \ldots P_{(m)}$$. For a given $$\alpha$$, find the largest $$k$$ such that


 * $$P_{(k)} \leq \frac{k}{m} \alpha.$$

Then reject (i.e. declare positive) all $$H_{(i)}$$ for $$i = 1, \ldots, k$$. ...Note, the mean $$\alpha$$ for these $$m$$ tests is $$\frac{\alpha(m+1)}{2m}$$ which could be used as a rough FDR (RFDR) or "$$\alpha$$ adjusted for $$m$$ indep. tests."

Dependent tests
The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest $$k$$ such that:


 * $$P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha $$


 * If the tests are independent: $$c(m) = 1$$ (same as above)
 * If the tests are positively correlated: $$c(m) = 1$$
 * If the tests are negatively correlated: $$c(m) = \sum _{i=1} ^m \frac{1}{i}$$

In the case of negative correlation, $$c(m)$$ can be approximated by using the Euler-Mascheroni constant


 * $$\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma.$$

Using RFDR above, an approximate FDR (AFDR) is the min(mean $$\alpha$$) for $$m$$ dependent tests = RFDR / ( ln($$m$$)+ 0.57721...).