Chernoff bounds

Herman Chernoff described a method on how to derive bounds on sequences of random variables. All the bounds derived using his method are now known as Chernoff bounds.

Chernoff's method centers around bounding a random variable $$X$$, which represents a sequence of random variables, by studying the random variable $$e^{tX}$$ rather than the variable $$X$$ itself. There are many flavors of Chernoff bounds; we present a bound on the relative error for a series of experiments as well as the absolute error.

Theorem (absolute error)
The following Theorem is due to Wassily Hoeffding. Assume random variables $$X_1, X_2, \ldots, X_m$$ are i.i.d. Let $$p = E \left [X_i \right ]$$, $$X_i \in \{0,1\}$$, and $$\varepsilon > 0$$. Then



\Pr\left[ \frac 1 m \sum X_i \geq p + \varepsilon \right] \leq \left ( {\left (\frac{p}{p + \varepsilon}\right )}^{p+\varepsilon} {\left (\frac{1 - p}{1 -p - \varepsilon}\right )}^{1 - p- \varepsilon}\right ) ^m = e^{ - D(p+\varepsilon\|p) m} $$ and

\Pr\left[ \frac 1 m \sum X_i \leq p - \varepsilon \right] \leq \left ( {\left (\frac{p}{p - \varepsilon}\right )}^{p-\varepsilon} {\left (\frac{1 - p}{1 -p + \varepsilon}\right )}^{1 - p+ \varepsilon}\right ) ^m = e^{ - D(p-\varepsilon\|p) m}, $$ where

D(x||y) = x \log \frac{x}{y} + (1-x) \log \frac{1-x}{1-y} $$ is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters $$x$$ and $$y$$ respectively.

Proof
The proof of this result relies on Markov's inequality for positive-valued random variables. First, let us set $$q = p + \varepsilon$$ in our bound for ease of notation. Letting $$\lambda > 0$$ be an arbitrary positive real, we see that

\Pr\left[ \sum X_i \ge mq \right] = \Pr\left[ e^{\sum X_i} \ge e^{mq}\right] = \Pr\left[\prod e^{X_i} \ge e^{mq}\right] = \Pr\left[\prod e^{\lambda X_i} \ge e^{\lambda m q}\right] $$ Applying Markov's inequality to the last expression, we see that

\Pr\left[ \frac{1}{m} \sum X_i \ge q\right] \le \frac{E \left[\prod e^{\lambda X_i}\right]}{e^{\lambda mq}} = \left[\frac{ E\left[e^{\lambda X_i} \right] }{e^{\lambda q}}\right]^m $$ where the equality follows from the independence of the $$n$$ $$X_i$$'s. Now, knowing that $$\Pr[X_i = 1] = p$$, $$\Pr[X_i = 0] = (1-p)$$, we have

\left[\frac{ E\left[e^{\lambda X_i} \right] }{e^{\lambda q}}\right]^m = \left[\frac{p e^\lambda + (1-p)}{e^{\lambda q} }\right]^m = [pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}]^m. $$

Because $$\lambda$$ is arbitrary, we can minimize the above expression with respect to $$\lambda$$, which is easily done using calculus and some logarithms. Thus,

\begin{align} \frac{d}{d\lambda} \log(pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}) & = \frac{1}{pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}} ((1-q)pe^{(1-q)\lambda}-q(1-p)e^{-q\lambda}) \\ & = -q + \frac{pe^{(1-q)\lambda}}{pe^{(1-q)\lambda}+(1-p)e^{-q\lambda}} \end{align} $$ Setting the last equation to zero and solving, we have

\begin{align} q & = \frac{pe^{(1-q)\lambda}}{pe^{(1-q)\lambda}+(1-p)e^{-q\lambda}} = \frac{pe^{(1-q)\lambda}}{e^{-q\lambda}(e^{\lambda}+(1-p)} \\ pe^{(1-q)\lambda} & = pe^{-q\lambda}e^\lambda = qe^{-q\lambda}(pe^{\lambda}+1-p) \\ \frac{p}{q}e^\lambda & = pe^\lambda + 1-p \end{align} $$ so that $$e^\lambda = (1-p)\left(\frac{p}{q}-p\right)^{-1}$$. Thus, $$\lambda = \log\left(\frac{(1-p)q}{(1-q)p}\right)$$. As $$q = p+\varepsilon > p$$, we see that $$\lambda > 0$$, so out bound is satisfied on $$\lambda$$. Having solved for $$\lambda$$, we can plug back into the equations above to find that

\begin{align} \log(pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}) &= \log[e^{-q\lambda}(1-p+pe^\lambda)] \\ & = \log\left[e^{-q \log\left(\frac{(1-p)q}{(1-q)p}\right)}\right] + \log\left[1-p+pe^{\log\left(\frac{1-p}{1-q}\right)}e^{\log\frac{q}{p}}\right] \\ & = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[1-p+ p\left(\frac{1-p}{1-q}\right)\frac{q}{p}\right] \\ & = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[\frac{(1-p)(1-q)}{1-q}+\frac{(1-p)q}{1-q}\right] \\ & = -q\log\frac{q}{p} + (1-q)\log\frac{1-p}{1-q} = -D(q \| p). \end{align} $$ We now have our desired result, that

\Pr\left[\frac{1}{m}\sum X_i \ge p + \varepsilon\right] \le e^{-D(p+\varepsilon\|p) m}. $$

To complete the proof for the symmetric case, we simply define the random variable $$Y_i = 1-X_i$$, apply the same proof, and plug into our bound.

Simpler bounds
A simpler bound follows by relaxing the theorem using $$D( p + x \| p) \geq 2 x^2$$. This results in a special case of Hoeffding's inequality. Sometimes, the bound $$D( (1+x) p \| p) \geq x^2 p/4$$ for $$-1/2 \leq x \leq 1/2$$, which is stronger for $$p<1/8$$, is also used.

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.

Theorem (relative error)
Let random variables $$X_1, X_2, \ldots, X_n$$ be i.i.d. random variables taking on values 0 or 1. Further, assume that $$\Pr(X_i = 1) = p_i$$. Then, if we let $$X = \sum_{i=1}^n X_i$$ and $$\mu$$ be the expectation of $$X$$, for any $$\delta > 0$$

\Pr \left[ X > (1+\delta)\mu\right] < \left(\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right)^\mu. $$

Proof
For any $$t > 0$$, we have that $$\Pr[X > (1+\delta)\mu)] = \Pr[\exp(tX) > \exp(t(1+\delta)\mu)]$$. Applying Markov's inequality to the right-hand side of the previous formula (noting that $$\exp(tX)$$ is always a positive random variable), we have

\Pr[X > (1+\delta)\mu] \le \frac{\mathbf{E}[\exp(tX)]}{\exp(t(1+\delta)\mu)}. $$

Noting that $$\mathbf{E}[\exp(tX)] = \mathbf{E}\left[\exp\left(t\sum_{i=1}^n X_i\right)\right] = \mathbf{E}\left[\prod_{i=1}^n\exp(tX_i)\right]$$, we can begin to bound $$\Pr[X > (1+\delta)\mu]$$. We have

\begin{align} \Pr[X > (1 + \delta)\mu)] & \le \frac{\mathbf{E}\left[\prod_{i=1}^n\exp(tX_i)\right]}{\exp(t(1+\delta)\mu)} \\ & = \frac{\prod_{i=1}^n\mathbf{E}[\exp(tX_i)]}{\exp(t(1+\delta)\mu)} \\ & = \frac{\prod_{i=1}^n\left[p_i\exp(t) + (1-p_i)\right]}{\exp(t(1+\delta)\mu)} \end{align} $$ The second line above follows because of the independence of the $$X_i$$s, and the third line follows because $$\exp(tX_i)$$ takes the value $$e^t$$ with probability $$p_i$$ and the value $$1$$ with probability $$1-p_i$$. Re-writing $$p_i\exp(t) + (1-p_i)$$ as $$p_i(\exp(t)-1) + 1$$ and recalling that $$1+x \le \exp(x)$$ (with strict inequality if $$x > 0$$), we set $$x = p_i(\exp(t)-1)$$. Thus

\Pr[X > (1+\delta)\mu] < \frac{\prod_{i=1}^n\exp(p_i(e^t-1))}{\exp(t(1+\delta)\mu)} = \frac{\exp\left((e^t-1)\sum_{i=1}^n p_i\right)}{\exp(t(1+\delta)\mu)} = \frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)}. $$

If we simply set $$t = \log(1+\delta)$$ so that $$ t > 0$$ for $$\delta > 0$$, we can substitute and find

\frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)} = \frac{\exp((1+\delta - 1)\mu)}{(1+\delta)^{(1+\delta)\mu}} = \left[\frac{\exp(\delta)}{(1+\delta)^{(1+\delta)}}\right]^\mu $$ This proves the result desired. A similar proof strategy can be used to show that

\Pr[X < (1-\delta)\mu] < \exp(-\mu\delta^2/2). $$