Kruskal-Wallis one-way analysis of variance

In statistics, the Kruskal-Wallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing equality of population medians among groups. Intuitively, it is identical to a one-way analysis of variance with the data replaced by their ranks. It is an extension of the Mann-Whitney U test to 3 or more groups.

Since it is a non-parametric method, the Kruskal-Wallis test does not assume a normal population, unlike the analogous one-way analysis of variance. Population variabilities among groups do not have to be equal -- unlike ANOVA, Kruskal-Wallis does not require Homoscedasticity. To get around this limitation of the Kruskal-Wallis test, some statisticians suggest using a robust test for equal locations among groups instead.

Method

 * 1) Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied.
 * 2) The test statistic is given by: $$K = (N-1)\frac{\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2}{\sum_{i=1}^g\sum_{j=1}^{n_i}(r_{ij} - \bar{r})^2}$$, where:
 * 3) *$$n_g$$ is the number of observations in group $$g$$
 * 4) *$$r_{ij}$$ is the rank (among all observations) of observation $$j$$ from group $$i$$
 * 5) *$$N$$ is the total number of observations across all groups
 * 6) *$$\bar{r}_{i\cdot} = \frac{\sum_{j=1}^{n_i}{r_{ij}}}{n_i}$$,
 * 7) *$$\bar{r} =(N+1)/2$$ is the average of all the $$r_{ij}$$.
 * Notice that the denominator of the expression for $$K$$ is exactly $$(N-1)N(N+1)/12$$. Thus $$K = \frac{12}{N(N+1)}\sum_{i=1}^g n_i(\bar{r}_{i\cdot} - \bar{r})^2$$.
 * 1) A correction for ties can be made by dividing $$K$$ by $$1 - \frac{\sum_{i=1}^G (t_{i}^3 - t_{i})}{N^3-N}$$, where G is the number of groupings of different tied ranks, and ti is the number of tied values within group i that are tied at a particular value.  This correction usually makes little difference in the value of K unless there are a large number of ties.
 * 2) Finally, the p-value is approximated by $$\Pr(\chi^2_{g-1} \ge K)$$. If some ni's are small (i.e., less than 5) the probability distribution of K can be quite different from this chi-square distribution. If a table of the chi-square probability distribution is available, the critical value of chi-square, $$\chi^2_{\alpha: g-1}$$, can be found by entering the table at $$g-1$$ degrees of freedom and looking under the desired significance or alpha level. The null hypothesis of equal population medians would then be rejected if $$K \ge \chi^2_{\alpha: g-1}$$.  Appropriate multiple comparisons would then be performed on the group medians.