Diversity index

In ecology, a diversity index is a statistic which is intended to measure the biodiversity of an ecosystem. More generally, diversity indices can be used to assess the diversity of any population in which each member belongs to a unique species. Estimators for diversity indices are likely to be biased, so caution is advisable when comparing similar values.

Species richness
The species richness $$S$$ is simply the number of species present in an ecosystem. This index makes no use of relative abundances.

Simpson's diversity index
If $$p_i$$ is the fraction of all organisms which belong to the i-th species, then Simpson's diversity index is most commonly defined as the statistic


 * $$ D = \sum_{i=1}^S p_i^2.$$

This quantity was introduced by Edward Hugh Simpson.

If $$n_i$$ is the number of individuals of species i which are counted, and $$N$$ is the total number of all individuals counted, then
 * $$ \sum_{i=1}^S \frac{n_i (n_i -1)}{N (N-1)} $$

is an estimator for Simpson's index for sampling without replacement.

Note that $$0 \leq D \leq 1$$, with values near zero corresponding to highly diverse or heterogeneous ecosystems and values near one corresponding to more homogeneous ecosystems. Biologists who find this confusing sometimes use $$1/D$$ instead; confusingly, this reciprocal quantity is also called Simpson's index. A more sensible response is to redefine Simpson's index as
 * $$\tilde{D} = 1 - D = 1 - \sum_{i=1}^S p_i^2,$$

(called by statisticians the index of diversity), since
 * this quantity has a simple intuitive interpretation: it represents the probability that if we randomly choose two individuals, that they will belong to distinct species,
 * this quantity is comparable with the Shannon diversity index, which has an even better theoretical justification as a measure of statistical inhomogeneity.

Shannon's diversity index
Shannon's diversity index is simply the ecologist's name for the communication entropy introduced by Claude Shannon:
 * $$ H = -\sum_{i=1}^S p_i \log p_i $$

where $$p_i$$ is the fraction of individuals belonging to the i-th species. This is by far the most widely used diversity index. The intuitive significance of this index can be described as follows. Suppose we devise binary codewords for each species in our ecosystem, with short codewords used for the most abundant species, and longer codewords for rare species. As we walk around and observe individual organisms, we call out the corresponding codeword. This gives a binary sequence. If we have used an efficient code, we will be able to save some breath by calling out a shorter sequence than would otherwise be the case. If so, the average codeword length we call out as we wander around will be close to the Shannon diversity index.

It is possible to write down estimators which attempt to correct for bias in finite sample sizes, but this would be misleading since communication entropy does not really fit expectations based upon parametric statistics. Differences arising from using two different estimators are likely to be overwhelmed by errors arising from other sources. Current best practice tends to use bootstrapping procedures to estimate communication entropy.

Shannon himself showed that his communication entropy enjoys some powerful formal properties, and furthermore, it is the unique quantity which does so. These observations are the foundation of its interpretation as a measure of statistical diversity (or "surprise", in the arena of communications). The applications of this quantity go far beyond the one discussed here; see the textbook cited below for an elementary survey of the extraordinary richness of modern information theory.

Berger-Parker index
The Berger-Parker diversity index is simply
 * $$\operatorname{max}_{1 \leq i \leq S} \, p_i$$

This is an example of an index which uses only partial information about the relative abundances of the various species in its definition.

Renyi entropy
The Species richness, the Shannon index, Simpson's index, and the Berger-Parker index can all be identified as particular examples of quantities bearing a simple relation to the Renyi entropy,
 * $$H_\alpha = \frac{1}{1-\alpha} \; \log \sum_{i=1}^S p_i^\alpha$$

for $$\alpha$$ approaching $$0, \, 1, \, 2, \, \infty$$ respectively.

Unfortunately, the powerful formal properties of communication entropy do not generalize to Renyi's entropy, which largely explains the much greater power and popularity of Shannon's index with respect to its competitors.