Shannon index

The Shannon index (incorrectly the Shannon–Weaver index or also incorrectly known as the Shannon-Weiner Index ) $$H^{\prime}$$ is one of several diversity indices used to measure diversity in categorical data. It is simply the Information entropy of the distribution, treating species as symbols and their relative population sizes as the probability.

This article treats its use in the measurement of biodiversity. The advantage of this index is that it takes into account the number of species and the evenness of the species. The index is increased either by having more unique species, or by having a greater species evenness.

The "Shannon-Weaver" name is a misnomer; apparently some biologists jumped to the conclusion that Warren Weaver, author of an influential preface to the book form of Claude Shannon's 1948 paper founding information theory, was a cofounder of this theory. Weaver did play a crucial role in the rapid postwar development of information theory in a different way, however; as an influential early administrator of the Rockefeller Foundation, he ensured that the first information theorists received generous research grants. Norbert Wiener had no hand in the index either, although his influential popularisation of cybernetics was often conflated with information theory in the 1950s.

Definitions

 * $$n_i$$ The number of individuals in each species; the abundance of each species.
 * $$S$$ The number of species. Also called species richness.
 * $$N$$ The total number of all individuals: $$\sum_{i=1}^S n_i$$
 * $$p_i$$ The relative abundance of each species, calculated as the proportion of individuals of a given species to the total number of individuals in the community: $$n_i\over N$$

Computing the index

 * $$H^\prime = -\sum_{i=1}^S p_i \ln p_i$$

By applying calculus, it can be shown that for any given number of species, there is a maximum possible $$H^\prime$$, $$H_\max=\ln S$$ which occurs when all species are present in equal numbers.

Proof that maximum evenness maximizes the index
The following will prove that any given population will have a maximum Shannon Index if and only if each species represented is composed of the same number of individuals.

Expanding the index:


 * $$H^\prime = -\sum_{i=1}^S {n_i\over N} \ln {n_i\over N}$$


 * $$N H^\prime = -\sum_{i=1}^S n_i \left ( \ln n_i - \ln N \right )

= -\sum_{i=1}^S n_i \ln n_i + \ln N \sum_{i=1}^S n_i $$


 * $$N H^\prime - N \ln N = -\sum_{i=1}^S n_i \ln n_i$$

Now, let's define $$H_s = -\sum_{i=1}^S n_i \ln n_i$$ Clearly, since $$N$$ is a positive constant for a given population size, and $$N\ln N$$ is also a constant, then maximizing $$H_s$$ is equivalent to maximizing $$H^\prime$$.

Strategy
Let's split an arbitrarily sized population into two groups, with each group receiving an arbitrary number of individuals and an arbitrary number of species. Now, within each group, each species has the same number of individuals as any other species in that group, but the number of individuals per species in the first group may be different from the number of individuals per species in the second group.

Now, if it can be proven that $$H_s$$ is maximized when the number of individuals per species in the first group matches the number of individuals per species in the second group, then it has been proved that the population has a maximum index only when each species in the population is evenly represented. $$H_s$$ doesn't depend on the total population. So $$H_s$$ may be built by simply adding the indices of two sub-populations. Since the population size is arbitrary, this proves that if you have two species (the smallest number that can be considered two groups), their index is maximized if they are present in equal numbers. So the rules of mathematical induction have been satisfied.

Proof
Now, divide the species into two groups. Within each group, the population is evenly distributed among the species present.
 * $$k$$ The number of individuals in the second group.
 * $$p$$ The number of species in the second group.
 * $$n_{i2} = k/p$$ Number of individuals in each species in the second group.
 * $$N-k$$ The number of individuals in the first group.
 * $$S-p$$ The species in the first group.
 * $$n_{i1} = {N-k \over S-p}$$ The individuals in each species in the first group.


 * $$H_s = -\sum_{i=1}^{S-p} {N-k \over S-p} \ln {N-k \over S-p}

- \sum_{i=1}^p {k\over p} \ln {k \over p}   = -\left ( N-k \right ) \ln  {N-k \over S-p} - k \ln {k\over p}. $$

To find out which value of $$k$$ will maximize $$H_s$$, we must find the value of $$k$$ which satisfies the equation:


 * $${d\over dk}\, H_s=0.$$

Differentiating,


 * $$\ln { N-k \over S-p} + (N-k){1 \over N-k} - \ln {k\over p} - k {1 \over k} = 0,$$


 * $$\ln {N-k\over S-p} = \ln {k \over p}$$

Exponentiating:


 * $${N-k\over S-p} = {k \over p} = {pN \over S}.$$

Now by applying the definitions of $$N_{i1}$$ and $$N_{i2}$$, we get


 * $$N_{i1} = N_{i2} = {N\over S}.$$

Result
Now we have accomplished the proof that the Shannon index is maximized when each species is present in equal numbers (see ). But what is the index in that case? Well, $$n_i = {N\over S}$$, so $$p_i = {1\over S}$$ Therefore:


 * $$H_\max = - \sum_{i=1}^S {1\over S} \ln {1\over S} = \ln S.$$