Beta-binomial model

Overview
In empirical Bayes methods, the Beta-binomial model is an analytic model where the likelihood function $$L(x|\theta)$$ is specifed by a binomial distribution


 * $$L(x|\theta) = \operatorname{Bin}(x,\theta)\,$$


 * $$ = {n\choose x}\theta^x(1-\theta)^{n-x}\,$$

and the conjugate prior $$\pi(\theta|\alpha,\beta)$$ is a Beta distribution


 * $$\pi(\theta|\alpha,\beta) = \mathrm{Beta}(\alpha,\beta)\,$$


 * $$ = \frac{1}{\mathrm{B}(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}.$$

Derivation of the posterior and the marginal
It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let


 * $$\pi(\theta|\mu,M) = Beta(\mu,M)\,$$


 * $$ = \frac{\Gamma (M)}{\Gamma(\mu M)\Gamma(M(1-\mu))} \theta^{M\mu-1}(1-\theta)^{M(1-\mu)-1}$$

where


 * $$ \mu = \frac{\alpha}{\alpha+\beta}\,$$ and  $$ M = \alpha+\beta\,$$

so that


 * $$E(\theta|\mu,M) = \mu\,$$
 * $$ Var(\theta|\mu,M) = \frac{\mu(1-\mu)}{M+1}$$

The posterior distribution $$\rho(\theta|x)$$ is also a beta distribution


 * $$ \rho(\theta|x) \propto l(x|\theta)\pi(\theta|\mu,M) $$


 * $$ = Beta(x+M \mu, n-x+M(1- \mu) ) \, $$


 * $$ = \frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}\theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1}\,$$

while the marginal distribution $$m(x|\mu, M)$$ is given by


 * $$ m(x|\mu,M) = \int_{0}^{1} l(x|\theta)\pi(\theta|\mu, M) d\theta $$


 * = $$\frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x} \int_{0}^{1} \theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1} d\theta $$


 * = $$\frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}

\frac{\Gamma (x+M\mu)\Gamma(n-x+M(1-\mu)) }{\Gamma(n+M)} $$

Moment estimates
Because the marginal is a complex, non-linear function of Gamma and Digamma functions, it is quite difficult to obtain a marginal maximum likelihood estimate (MMLE) for the mean and variance. Instead, we use the method of iterated expectations to find the expected value of the marginal moments.

Let us write our model as a two-stage compund sampling model (for each event i out of ni possible events):


 * $$ X_{i}|\theta_{i} \sim \mathrm{Bin}(n_{i}, \theta_{i})\,$$


 * $$ \theta_{i} \sim \mathrm{Beta}|(\mu,M), i.i.d\,$$

We can find iterated moment estimates for the mean and variance using the moments for the distributions in the two-stage model:


 * $$E\left(\frac{X}{n}\right) = E\left[E\left(\frac{X}{n}|\theta\right)\right] = E(\theta) = \mu$$


 * $$\mathrm{var}\left(\frac{X}{n}\right) = E\left[\mathrm{var}\left(\frac{X}{n}|\theta\right)\right] + \mathrm{var}\left[E\left(\frac{X}{n}|\theta\right)\right]$$


 * $$ = E\left[\left(\frac{1}{n}\right)\theta(1-\theta)|\mu,M\right] + \mathrm{var}\left(\theta|\mu,M\right)$$


 * $$ = \frac{1}{n}\left(\mu(1-\mu)\right)+ \frac{n_{i}-1}{n_{i}}\frac{(\mu(1-\mu))}{M+1}$$


 * $$ = \frac{\mu(1-\mu)}{n}\left(1+\frac{n-1}{M+1}\right).$$

We now seek a point estimate $$\tilde{\theta}$$ as a weighted average of the sample estimate $$\hat{\theta_{i}}$$ and an estimate for $$\hat{\mu}$$. The sample estimate $$\hat{\theta_{i}}$$ is given by


 * $$\hat{\theta_{i}} = \frac{x_{i}}{n_{i}}\,$$

Therefore we need point estimates for $$\mu$$ and $$M$$. The estimated mean $$\hat{\mu}$$ is given as a weighted average


 * $$\hat{\mu} = \frac{\sum_{i=1}^{N} n_{i} \hat{\theta_{i}} }{\sum_{i=1}^{N} n_{i} } = \frac{\sum_{i=1}^{N} x_{i} }{\sum_{i=1}^{N} n_{i} }$$

The hyperparameter $$M$$ is obtained using the moment estimates for the variance of the two-stage model:


 * $$s^{2} = \frac{1}{N} \sum_{i=1}^{N} \mathrm{var}\left(\frac{x_{i}}{n_{i}}\right) = \frac{1}{N} \sum_{i=1}^{N} \frac{\hat{\mu}(1-\hat{\mu})}{n_{i}} \left[1+\frac{n_{i}-1}{\hat{M}+1}\right]= $$


 * $$s^{2} = \frac{N \sum_{i=1}^{N} n_{i} (\hat{\theta_{i}} - \hat{\mu_{i}})^{2} }{(N-1)\sum_{i=1}^{N} n_{i} } $$

We can now solve for $$\hat{M}$$:


 * $$\hat{M} = \frac{\hat{\mu}(1-\hat{\mu}) - s^{2}}{s^{2} - \frac{\hat{\mu}(1 - \hat{\mu})}{N}\sum_{i=1}^{N} n_{i} }.$$

Given our point estimates for the prior, we may now plug in these values to find a point estimate for the posterior


 * $$\tilde{\theta_{i}} = E(\Theta|\hat{\mu},\hat{M}) = \frac{\frac{x_{i}}{n_{i}} + \hat{M}\hat{\mu}}{n_{i}+\hat{M}} = \frac{\hat{M}}{n_{i}+\hat{M_{i}}}\hat{\mu} + \frac{n_{i}}{n_{i}+\hat{M_{i}}}\frac{x_{i}}{n_{i}}$$

Shrinkage factors
We may write the posterior estimate as a weighted average:


 * $$\tilde{\theta_{i}} = \hat{B_{i}}\hat{\mu} + (1-\hat{B_{i}})\hat{\theta_{i}}$$

where $$\hat{B_{i}}$$ is called the Shrinkage Factor.


 * $$\hat{B_{i}} = \frac{\hat{M_{i}}}{\hat{M_{i}}+n_{i}}$$