Restrictive imputation of incomplete survey data
Imputation is a method to correct for missing data by using various models to estimate missing values whilst adding the estimated data to the original dataset. The completed dataset can then be analyzed by methods for complete data. To estimate the reliability of estimates on imputed data, however, special techniques are needed, because standard methods for complete data do not discriminate between real and imputed data.
Imputations are predictions for the values that could have been encountered, if the missing data would have been observed. Because imputations are, to some extent, used as real observations, these predictions have to be as accurate as possible. In order to obtain accurate estimates, models have to be constructed that optimally represent the properties of the various variables and their internal coherence. In addition to the quality of predictions, plausible imputations also have to meet certain a priori knowledge, such as variable restrictions (e.g. an income must be greater than or equal to zero) or restrictions conform to known population distributions (e.g. the known amount of cars in a country).
Three research topics will be distinguished in this research proposal: imputing variables that have to meet restrictions (§A), imputing semi-continuous variables (§B) and measuring the quality of imputation models and the accuracy and reliability of estimations on imputed data (§C). These research questions can be answered within a PhD position, resulting in a dissertation, as well as new software. Expected results include answering the following general research questions:
– How can imputations under row and column restrictions be executed?
– How can imputations on semi-continuous data best be done?
– How can imputations most effectively and plausibly be evaluated?

Furthermore, based on the research in this PhD-project, recommendations for routinely use of imputation methods at Statistics Netherlands will be made.

Financed by
Utrecht University and Statistics Netherlands (CBS)

– prof. dr S. Van Buuren (Utrecht University)
– dr J. Pannekoek (CBS)
– dr L.E. Frank (daily supervisor, Utrecht University)

1 September 2009 – 1 September 2014