Investigation of statistical properties of proper ways to combine the nonresponse model and the outcome model for drawing imputations
Shahab Jolani (PhD student)
Department of Methodology & Statistics
Faculty of Social Sciences
Project: project financed by Utrecht University
Project running from: 1 July 2010 – 1 July 2012
Promotores: Prof. Dr. S. van Buuren (UU) and Dr. L. E. Frank (UU)
Missing values are undesirable for a correct statistical analysis of data. Therefore, statisticians have always attempted to resolve the problem of missing values. The older and simple strategy is to choose ad-hoc methods (e.g. available case, complete case) which introduces bias in estimation methods and also changes the data features like variability, symmetry and so on. Rubin (1987) introduced an idea which is to replace each missing value more then once in the data set prior to analysis. Now, each complete set is analyzed in the same fashion by a complete-data method. This approach, which is called Multiple Imputation (MI), has become more popular and is considered as the State of the Art in missing data analysis (Schafer and Graham, 2002). MI produces estimates that are consistent, asymptotically normally distributed and asymptotically efficient if used correctly. In addition, MI can be used with virtually any kind of data and software is available to perform the analyses. Moreover, if the observed data contain useful information for predicting missing values, an imputation procedure can make use of this information and maintain high precision. Of course, MI has also drawbacks. It can be difficult to implement and it is easy to do it the wrong way. Most importantly, MI produces different estimates (hopefully, only slightly different) when we use it in the same data set for several times. The reason behind this is that random variation is deliberately introduced in the imputation process. Without a random component, deterministic imputation methods generally produce underestimates of variances for variables with missing data. A recent overview of MI has been published by Enders (2010) and references therein. A broad investigation in medical research has also been done by Kenward and Carpenter (2007).
The most complex step in MI is to specify the imputation model, which is not always an easy task for different missing data mechanisms. It is generally accepted that imputation models should condition on both determinants in the outcome model and the nonresponse model. There are potentially many ways to combine both models, and it is not yet clear how these models should be represented in the imputation model. This research project will develop some new methods that would have desirable statistical properties for dealing with different types of missing data mechanisms.
Four research topics will be distinguished in this research project: (i) imputation models based on a combination of the outcome and the nonresponse models for the ignorable missing data mechanism, (ii) imputation models based on the combination of the outcome and the nonresponse models when the missing data mechanism is NOT ignorable, (iii) compatibility of fully conditional specification approach in imputation models, and (iv) imputation in planned missing data patterns. The following research questions will be addressed in this research project:
- What is the proper way to combine the outcome model and the nonresponse model for drawing imputation when missing data is at random?
- What is the proper way to combine the outcome model and the nonresponse model for drawing imputation when missing data is NOT at random?
- Under what circumstance fully conditional specification approach will be converge?
- Can we impute the missing potential outcome in nonrandomized studies, and estimate the treatment effect by the individual difference between potential outcomes?
The results will be presented in several research papers that will constitute the dissertation. Furthermore, based on the research in this PhD project, recommendations for routinely use of imputation methods will be made and R code will be developed for the new methods that will be created during the research project.
Date of defence: 7 December 2012
Title of thesis: Dual imputation strategies for analyzing incomplete data
Publisher: Zuidam Uithof Drukkerijen (Utrecht)