Rand index

The Rand index or Rand measure is a measure of the similarity between two data clusters.

Definition
Given a set of $$n$$ elements $$S = \{O_1, \ldots, O_n\}$$ and two partitions of $$S$$ to compare, $$X = \{x_1, \ldots, x_r\}$$ and $$Y = \{y_1, \ldots, y_s\}$$, we define the following: The Rand index, $$R$$, is:
 * $$a$$, the number of pairs of elements in $$S$$ that are in the same set in $$X$$ and in the same set in $$Y$$
 * $$b$$, the number of pairs of elements in $$S$$ that are in different sets in $$X$$ and in different sets in $$Y$$
 * $$c$$, the number of pairs of elements in $$S$$ that are in the same set in $$X$$ and in different sets in $$Y$$
 * $$d$$, the number of pairs of elements in $$S$$ that are in different sets in $$X$$ and in the same set in $$Y$$
 * $$ R = \frac{a+b}{a+b+c+d} = \frac{a+b}$$

Intuitively, one can think of $$a + b$$ as the number of agreements between $$X$$ and $$Y$$ and $$c + d$$ as the number of disagreements between $$X$$ and $$Y$$.

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.