Memorylessness


 * For the use of the term in materials science see hysteresis.

In probability theory, memorylessness is a property of certain probability distributions: the exponential distributions and the geometric distributions, wherein any derived probability from a set of random samples is distinct and has no information (i.e. "memory") of earlier samples.

For example, suppose a die is thrown as many times as it takes to get a "1", so that the probability of "success" on each trial is 1/6, and the random variable X is the number of times the die must be thrown. Then X has a geometric distribution, and the conditional probability that the die must be thrown at least four more times to get a "1", given that it has already been thrown 10 times without a "1" appearing, is no different from the original probability that the die would be thrown at least four times. In effect, the random process does not "remember" how many failures have occurred so far.

Discrete memorylessness
Suppose X is a discrete random variable whose values lie in the set { 0, 1, 2, ... } or in the set { 1, 2, 3, ... }. The probability distribution of X is memoryless precisely if for any x, y in { 0, 1, 2, ... } or in { 1, 2, 3, ... }, (as the case may be), we have


 * $$\Pr(X>x+y \mid X>x)=\Pr(X>y).$$

Here, Pr(X > x + y | X > x) denotes the conditional probability that the value of X is larger than x + y, given that it is larger than x.

It can readily be shown that the only probability distributions that enjoy this discrete memorylessness are geometric distributions. These are the distributions of the number of independent Bernoulli trials needed to get one "success", with a fixed probability p of "success" on each trial.

A frequent misunderstanding
Memorylessness is often misunderstood by students taking courses on probability: the fact that Pr(X > 13 | X > 10) = Pr(X > 3) does not mean that the events X > 13 and X > 10 are independent; i.e. it does not mean that Pr(X > 13 | X > 10) = Pr(X > 13). To summarize: "memorylessness" of the probability distribution of the number of trials X until the first success means


 * $$\mathrm{(Right)}\ \Pr(X>13 \mid X>10)=\Pr(X>3).\,$$

It does not mean


 * $$\mathrm{(Wrong)}\ \Pr(X>13 \mid X>10)=\Pr(X>13).\,$$

(That would be independence. These two events are not independent.)

Continuous memorylessness
Suppose that rather than considering the discrete number of trials until the first "success", we consider continuous waiting time T until the arrival of the first phone call at a switchboard. To say that the probability distribution of T is memoryless means that for any positive real numbers s and t, we have


 * $$\Pr(T>t+s \mid T>t)=\Pr(T>s).\,$$

This is similar to the discrete version except that s and t are constrained only to be positive (or sometimes non-negative) real numbers instead of integers.

Characterization by memorylessness
Memorylessness completely characterizes the exponential distributions, i.e. the only probability distributions that enjoy (continuous) memorylessness are the exponential distributions.

To see this, first let


 * $$G(t) = \Pr(X > t).\,$$

Then basic laws of probability quickly imply that G(t) gets smaller as t gets bigger. From the relation


 * $$\Pr(X > t + s | X > t) = \Pr(X > s)\,$$

and the definition of conditional probability, we get


 * $${\Pr(X > t + s) \over \Pr(X > t)} = \Pr(X > s).$$

Thus we have the functional equation


 * $$G(t + s) = G(t) G(s)\,$$

and G is a monotone decreasing function.

The functional equation alone will imply that G restricted to rational multiples of any particular number is an exponential function. Combined with the fact that G is monotone, this implies G on its whole domain is an exponential function.