Imputation (statistics)


 * There is also an imputation disambiguation page.

In statistics, imputation is the substitution of some value for a missing data point or a missing component of a data point. Once all missing values have been imputed, the dataset can then be analysed using standard techniques for complete data. The analysis should ideally take into account that there is a greater degree of uncertainty than if the imputed values had actually been observed, however, and this generally requires some modification of the standard complete-data analysis methods. While many imputation techniques are available, two of the most commonly used are hot-deck imputation and regression imputation.

Hot-deck imputation fills in missing values on incomplete records using values from similar, but complete records of the same dataset. (The term "hot deck" dates back to the storage of data on punch cards, and indicates that the information donors come from the same dataset as the recipients; the stack of cards was hot because it was currently being processed. Cold-deck imputation, by contrast, selects donors from another dataset.)

Since standard analysis techniques do not reflect the additional uncertainty due to imputing for missing data, further adjustments (such as multiple imputation or a Rao-Shao correction) are necessary to account for this.

Imputation is not the only method available for handling missing data. It usually gives better results than listwise deletion (in which all subjects with any missing values are omitted from the analysis), and may be competitive with a maximum likelihood approach in many circumstances.