Statistical benchmarking

In statistics, benchmarking is a method of using auxiliary information to adjust the sampling weights used in an estimation process, in order to yield more accurate estimates of totals.

Suppose we have a population where each unit $$k$$ has a "value" $$Y(k)$$ associated with it. For example, $$Y(k)$$ could be a wage of an employee $$k$$, or the cost of an item $$k$$. Suppose we want to estimate the sum $$Y$$ of all the $$Y(k)$$. So we take a sample of the $$k$$, get a sampling weight W(k) for all sampled $$k$$, and then sum up $$W(k) x Y(k)$$ for all sampled $$k$$.

One property usually common to the weights $$W(k)$$ described here is that if we sum them over all sampled $$k$$, then this sum is an estimate of the total number of units $$k$$ in the population (for example, the total employment, or the total number of items). Because we have a sample, this estimate of the total number of units in the population will differ from the true population total. Similarly, the estimate of total $$Y$$ (where we sum $$W(k) x Y(k)$$ for all sampled $$k$$) will also differ from true population total.

We do not know what the true population total $$Y$$ value is (if we did, there would be no point in sampling!). Yet often we do know what the sum of the $$W(k)$$ are over all units in the population. For example, we may not know the total earnings of the population or the total cost of the population, but often we know the total employment or total volume of sales. And even if we don't know these exactly, there often are surveys done by other organizations or at earlier times, with very accurate estimates of these auxiliary quantities.

The benchmarking procedure begins by first breaking the population into benchmarking cells. Cells are formed by grouping units together that share common characteristics, for example, similar $$Y(k)$$, yet anything can be used that enhances the accuracy of the final estimates. For each cell $$C$$, we let $$W(C)$$ be the sum of all $$W(k)$$, where the sum is taken over all sampled $$k$$ in the cell $$C$$. For each cell $$C$$, we let $$T(C)$$ be the auxiliary value for cell $$C$$, which is commonly called the "benchmark target" for cell $$C$$. Next, we compute a benchmark facor $$F(C) = T(C) / W(C)$$. Then, we adjust all weights $$W(k)$$ by multiplying it by its benchmark factor $$F(C)$$, for its cell $$C$$. The net result is that the estimated $$W$$ [formed by summing $$F(C) x W(k)$$] will now equal the benchmark target total $$T$$. But the more important benefit is that the estimate of the total of $$Y$$ [formed by summing $$F(C) x F(k) x Y(k)$$] will tend to be more accurate.