Law of total expectation

The proposition in probability theory known as the law of total expectation, the law of iterated expectations, the tower rule, the smoothing theorem, or perhaps by any of a variety of other names, states that if X is an integrable random variable (i.e., a random variable satisfying E( | X | ) < ∞) and Y is any random variable, not necessarily integrable, on the same probability space, then


 * $$\operatorname{E} (X) = \operatorname{E} ( \operatorname{E} ( X \mid Y)),$$

i.e., the expected value of the conditional expected value of X given Y is the same as the expected value of X.

The nomenclature used here parallels the phrase law of total probability. See also law of total variance.

(The conditional expected value E( X | Y ) is a random variable in its own right, whose value depends on the value of Y. Notice that the conditional expected value of X given the event  Y = y is a function of y (this is where adherence to the conventional rigidly case-sensitive notation of probability theory becomes important!).  If we write E( X | Y = y) = g(y) then the random variable E( X | Y ) is just g(Y). )

Proof in the discrete case


\begin{align} \operatorname{E} \left( \operatorname{E}(X|Y) \right) &{} = \sum\limits_y \operatorname{E}(X|Y=y) \cdot \operatorname{P}(Y=y) \\ &{}=\sum\limits_y \left( \sum\limits_x x \cdot \operatorname{P}(X=x|Y=y) \right) \cdot \operatorname{P}(Y=y) \\ &{}=\sum\limits_y \sum\limits_x x \cdot \operatorname{P}(X=x|Y=y) \cdot \operatorname{P}(Y=y) \\ &{}=\sum\limits_y \sum\limits_x x \cdot \operatorname{P}(Y=y|X=x) \cdot \operatorname{P}(X=x) \\ &{}=\sum\limits_x \sum\limits_y x \cdot \operatorname{P}(Y=y|X=x) \cdot \operatorname{P}(X=x) \\ &{}=\sum\limits_x x \cdot \operatorname{P}(X=x) \cdot \left( \sum\limits_y \operatorname{P}(Y=y|X=x) \right) \\ &{}=\sum\limits_x x \cdot \operatorname{P}(X=x) \\ &{}=\operatorname{E}(X). \end{align} $$

Iterated expectations with nested conditioning sets
The following formulation of the law of iterated expectations plays an important role in many macroeconomic and finance models:


 * $$\operatorname{E} (X \mid I1) = \operatorname{E} ( \operatorname{E} ( X \mid I2) \mid I1),$$

where I1 is a subset of I2. To build intuition, imagine an investor who forecasts a random stock price (X) based on the limited information set I1. The law of iterated expectations says that the investor can never gain a more precise forecast of X by conditioning on more specific information (I2), if the more specific forecast must itself be forecast with the original information (I1).

This formulation is often applied in a time series context, where Et denotes expectations with respect to information observed through time period t. In typical models the information set t+1 contains all information available through time t, plus additional information revealed at time t+1. One can then write:


 * $$\operatorname{E_t} (X) = \operatorname{E_t} ( \operatorname{E_{t+1}} ( X )),$$