Rule of succession

In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem.

The formula is still used, particularly to estimate underlying probabilities for events which have not been observed to occur at all in (finite) sample data. Assigning such events a zero probability would contravene Cromwell's rule, and is not justified by the evidence.

Statement of the rule of succession
Suppose p is uniformly distributed on the interval [0, 1]. Suppose X1, ..., Xn+1 are conditionally independent random variables given the value of p, and conditional on p are Bernoulli-distributed with expected value p, i.e., each has probability p of being equal to 1 and probability 1 &minus; p of being equal to 0. Then


 * $$P(X_{n+1}=1 \mid X_1+\cdots+X_n=s)={s+1 \over n+2}.$$

The probability that the sun will rise tomorrow
Let p be the long-run frequency of sunrises, i.e., the sun rises on (100 × p)% of days. Prior to knowing of any sunrises, one is completely ignorant of the value of p. Laplace represented this prior ignorance by means of a uniform probability distribution on p. Thus the probability that p is between 20% and 50% is just 30%. This must not be interpreted to mean that in 30% of all cases, p is between 20% and 50%; that would be a frequentist philosophy of applied probability. Rather, it means that one's state of knowledge (or ignorance) justifies one in being 30% sure that the sun rises between 20% of the time and 50% of the time -- that is a Bayesian philosophy of applied probability. Given the value of p, and no other information relevant to the question of whether the sun will rise tomorrow, the probability that the sun will rise tomorrow is p. But we are not "given the value of p". What we are given is the observed data: the sun has risen every day on record. Laplace inferred the number of days by saying that the universe was created about 6000 years ago, based on a literal construction of the Bible. To find the conditional probability distribution of p given the data, one uses Bayes theorem, which some call the Bayes-Laplace rule. Having found the conditional probability distribution of p given the data, one may then calculate the conditional probability, given the data, that the sun will rise tomorrow. That conditional probability is given by the rule of succession. The probability that the sun will rise tomorrow increases with the number of days on which the sun has risen so far and would decrease as the number of days on which the sun has failed to rise increases.

Mathematical details
The proportion p is treated as a uniformly distributed random variable. (Some who take an extreme Bayesian approach to applied probability insist that the word random should be banished altogether from probability theory, on the grounds of examples like this one. This proportion is not random, but uncertain.  We assign a probability distribution to p to express our uncertainty, not to attribute randomness to p.)

Let Xi be the number of "successes" on the ith trial, with probability p of success on each trial. Thus each X is 0 or 1; each X has a Bernoulli distribution. Suppose these Xs are conditionally independent given p.

Bayes' theorem says that in order to get the conditional probability distribution of p given the data Xi, i = 1, ..., n, one multiplies the "prior" (i.e., marginal) probability measure assigned to p by the likelihood function


 * $$L(p)=P(X_1=x_1, \cdots, X_n=x_n \mid p)=\prod_{i=1}^n p^{x_i}(1-p)^{1-x_i}=p^s (1-p)^{n-s}$$

where s = x1 + ... + xn is the number of "successes" and n is of course the number of trials, and then normalizes, to get the "posterior" (i.e., conditional on the data) probability distribution of p. (We are using capital X to denote a random variable and lower-case x either as the dummy in the definition of a function or as the data actually observed.)

The prior probability density function is equal to 1 for 0 < p < 1 and equal to 0 for p < 0 or p > 1. To get the normalizing constant, we find


 * $$\int_0^1 p^s(1-p)^{n-s}\,dp={s!(n-s)! \over (n+1)!}$$

(see beta function for more on integrals of this form).

The posterior probability density function is therefore


 * $$f(p)={(n+1)! \over s!(n-s)!}p^s(1-p)^{n-s}.$$

This is a beta distribution with expected value


 * $${s+1 \over n+2}.$$

Since the conditional probability of tomorrow's sunrise, given the value of p, is just p, the law of total probability tell us that the probability of tomorrow's sunrise is just the expected value of p. Since all of this is conditional on the observed data Xi for i = 1, ..., n, we have


 * $$P(X_{n+1}=1 \mid X_i=x_i\ \mbox{for}\ i=1,\dots,n)={s+1 \over n+2}.$$

Thus if the sun has risen every morning for 4,000 million years (1,460,000,000,000 consecutive mornings), and no other data are available, Laplace would have us conclude that the probability of tomorrow's sunrise is


 * $${1,\!460,\!000,\!000,\!001 \over 1,\!460,\!000,\!000,\!002}.$$

The probability that the sun will not rise tomorrow would be slightly less than two in three trillion.