A priori (statistics)

In statistics, a priori knowledge refers to prior knowledge about a population, rather than that estimated by recent observation. It is common in Bayesian inference to make inferences conditional upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian and Frequentist approach to statistics. We need not be 100% certain about something before it can be considered A priori knowledge, but conducting estimation conditional upon assumptions for which there is little evidence should be avoided. A priori knowledge often consists of knowledge of the domain of a parameter (for example, that it is positive) that can be incorporated to improve an estimate. Within this domain the distribution is usually assumed to be uniform in order to take advantage of certain theoretical results (most importantly the central limit theorem).

Basic example
Suppose that we pick (without replacement) two red beads and three black beads from a bag; what is the probability that the next bead we pick out will be red? Without a priori knowledge, we cannot reasonably answer the question. But if we already knew that there were only two red beads in the bag, then we could be certain that the probability of picking out another red bead was in fact zero. In this instance, we are 100% certain that the probability is zero, essentially because the population is finite.

More theoretical example
Suppose that we are trying to estimate the coefficients of an autoregressive (AR) stochastic process based on recorded data, and we know beforehand that the process is stationary. Any AR(2) process is of the form:
 * $$ X_{k} + \theta_1 X_{k-1} + \theta_2  X_{k-2} = \epsilon_k $$

Under the classical frequentist approach, we would proceed with Maximum Likelihood Estimation (MLE), but instead we can integrate our knowledge into the Likelihood function and maximize our likelihood conditional upon the fact that the process is stationary. We can assign prior distributions to the AR coefficients $$\theta_1,\theta_2$$ that are uniform across a limited domain in line with the constraints upon stationary process coefficients. For an AR(2) process, the constaints are:
 * $$|\theta_2| < 1,$$
 * $$ \theta_2 + 1 > |\theta_1| $$

Adding this information will change the Likelihood function, and when we now use MLE to estimate the coefficients, we will in general obtain a better estimate. This is true in particular when we suspect that the coefficients are near the boundary of the stationary domain. Note that the distribution on the domain is uniform, so we have not made any assumptions about what the coefficients actually are, only their domain.