Holland's schema theorem

Holland's schema theorem is widely taken to be the foundation for explanations of the power of genetic algorithms.

A schema is a template that identifies a subset of strings with similarities at certain string positions.

Description
For example, consider binary strings of length 6. The schema 1**0*1 describes the set of all strings of length 6 with 1's at positions 1 and 6 and a 0 at position 4. The * is a "don't care" symbol, which means that positions 2, 3 and 5 can have a value of either 1 or 0. The order of a schema is defined as the number of fixed positions in the template, while the defining length $$ \delta(H) $$ is the distance between the first and last specific positions. The order of 1**0*1 is 3 and its defining length is 5. The fitness of a schema is the average fitness of all strings matching the schema. The fitness of a string is a measure of the value of the encoded problem solution, as computed by a problem-specific evaluation function. With the genetic operators as defined above, the schema theorem states that short, low-order, schemata with above-average fitness increase exponentially in successive generations. Expressed as an equation:


 * $$m(H,t+1) \geq {m(H,t) f(H) \over a_t}[1-p]$$

Here m(h,t) is the number of strings belonging to schema h at generation t, f(h) is the observed fitness of schema h and at is the observed average fitness at generation t. The probability of disruption p is the probability that crossover or mutation will destroy the schema h. It can be expressed as


 * $$p = {\delta(H) \over l-1}p_c + o(H) p_m$$

where $$ o(H)$$ is the number of fixed positions, $$l$$ is the length of the code, $$ p_m$$ is the probability of mutation and $$ p_c $$ is the probability of crossover. So a schema with a shorter defining length $$ \delta(H) $$ is less likely to be disrupted. An often misunderstood point is why the Schema Theorem is an inequality rather than an equality. The answer is in fact simple: the Theorem neglects the small, yet non-zero probability, that a string belonging to the schema h will be created "from scratch" by mutation of a string that did not belong to h in the previous generation.