Delta method

In statistics, the delta method is a method for deriving an approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator. More broadly, the delta method may be considered a fairly general central limit theorem.

Univariate delta method
While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, for some sequence of random variables Xn satisfying
 * $${\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,N(0,\sigma^2)},$$

where $$\theta$$ and $$\sigma^2$$ are finite valued constants and $$\xrightarrow{D}$$ denotes convergence in distribution, it is the case that
 * $${\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,N(0,\sigma^2[g'(\theta)]^2)}$$

for any function g satisfying the property that $$g'(\theta)$$ exists and is non-zero valued. (The final restriction is really only needed for purposes of clarity in argument and application. Should the first derivative evaluate to zero at $$\theta$$, then the delta method may be extended via use of a second or higher order Taylor series expansion.)

Proof in the univariate case
Demonstration of this result is fairly straightforward under the assumption that $$g'(\theta)$$ is continuous. To begin, we construct a first-order Taylor series expansion of $$g(X_n)$$ around $$\theta$$:
 * $$g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),$$

where $$\tilde{\theta}$$ lies between $$X_n$$ and $$\theta$$. Note that since $$X_n\,\xrightarrow{P}\,\theta$$ implies $$\tilde{\theta} \,\xrightarrow{P}\,\theta$$ and $$g'(\theta)$$ is continuous, applying Slutsky's Theorem yields
 * $$g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),$$

where $$\xrightarrow{P}$$ denotes convergence in probability.

Rearranging the terms and multiplying by $$\sqrt{n}$$ gives
 * $$\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].$$

Since
 * $${\sqrt{n}[X_n-\theta] \xrightarrow{D} N(0,\sigma^2)}$$

by assumption, it follows immediately from appeal to Slutsky's Theorem that
 * $${\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} N(0,\sigma^2[g'(\theta)]^2)}.$$

This concludes the proof.

Motivation of multivariate delta method
By definition, a consistent estimator $$B$$ converges in probability to its true value $$\beta$$, and often a central limit theorem can be applied to obtain asymptotic normality:



\sqrt{n}\left(B-\beta\right)\,\xrightarrow{D}\,N\left(0, \operatorname{Var}(B) \right), $$

where n is the number of observations. Suppose we want to estimate the variance of a function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as



h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta) $$

which implies the variance of h(B) is approximately



\begin{align} \operatorname{Var}\left(h(B)\right) & \approx \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\

& = \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\

& = \operatorname{Var}\left(\nabla h(\beta)^T \cdot B\right) \\

& = \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta) \end{align} $$

The delta method therefore implies that



\sqrt{n}\left(h(B)-h(\beta)\right)\,\xrightarrow{D}\,N\left(0, \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta) \right) $$

or in univariate terms,



\sqrt{n}\left(h(B)-h(\beta)\right)\,\xrightarrow{D}\,N\left(0, \operatorname{Var}(B) \cdot \left(h^\prime(\beta)\right)^2 \right). $$

Example
Suppose $$ X_n $$ is Binomial with parameters $$ p$$ and $$ n $$. Since
 * $${\sqrt{n} \left[ \frac{X_n}{n}-p \right]\,\xrightarrow{D}\,N(0,p (1-p))},$$

we can apply the Delta method with $$ g(\theta) = \log(\theta) $$ to see
 * $${\sqrt{n} \left[ \log\left( \frac{X_n}{n}\right)-\log(p)\right] \,\xrightarrow{D}\,N(0,p (1-p) [1/p]^2)}$$

Hence, the variance of $$ \log \left( \frac{X_n}{n} \right) $$ is approximately $$ \frac{1-p}{p \, n} $$. Moreoever, if $$\hat p $$ and $$\hat q$$ are estimates of different group rates from independent samples of sizes $$n$$ and $$m$$ respectively, then the logarithm of the estimated relative risk $$ \frac{\hat p}{\hat q} $$ is approximately normally distributed with variance that can be estimated by $$ \frac{1-\hat p}{\hat p \, n}+\frac{1-\hat q}{\hat q \, m} $$. This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.

Note
The delta method is nearly identical to the formulae presented in Klein (1953, p. 258):

\begin{align} \operatorname{Var} \left( h_r \right) = & \sum_i \left( \frac{ \partial h_r }{ \partial B_i } \right)^2 \operatorname{Var}\left( B_i \right) + \\ & \sum_i \sum_{j \neq i}   \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_r }{ \partial B_j } \right) \operatorname{Cov}\left( B_i, B_j \right) \\ \operatorname{Cov}\left( h_r, h_s \right) = & \sum_i \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_s }{ \partial B_i } \right) \operatorname{Var}\left( B_i \right) + \\ & \sum_i \sum_{j \neq i}   \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_s }{ \partial B_j } \right) \operatorname{Cov}\left( B_i, B_j \right) \end{align} $$

where hr is the rth element of h(B) and Biis the ith element of $$B$$. The only difference is that Klein stated these as identities, whereas they are actually approximations.