Multiple imputation using mixture models
Daniel Van der Palm
MTO, Tilburg School of Social and Behavioral Sciences
Project financed by: NWO (Netherlands Organisation for Scientific Research)
Project running from: 1 September 2009 – 1 September 2013
Promotores: Prof. Dr. J.K. Vermunt, Prof. Dr. K. Sijtsma
The main focus of this project is on the use of mixture models for multiple imputation (MI) of missing data, or more specifically, item nonresponse. Vermunt, Van Ginkel, van der Ark, and Sijtsma (2008) explored the use of a simple latent class model (Goodman, 1974), which is a mixture model for categorical response variables, as a tool for MI.
Despite of being a very promising approach, various issues remain unresolved when applying mixture models for MI. The purpose of this project is to address four unresolved problems mentioned by Vermunt et al. (2008) in the discussion section of their article:
- Whereas Vermunt et al. (2008) concentrated on imputation of data sets containing only categorical variables, most data sets contain combinations of categorical and continuous variables. The current project will investigate how imputation by means of mixture models can best be generalized to such mixed data sets.
- It is not clear at all whether the decision which statistical model explains the data best (also known as model selection) in the context of mixture modeling for generating multiple imputations can be taken in the same way as when applying mixture models to build a substantively meaningful model. More specifically, standard model selection statistics such as information criteria (AIC, BIC) and overall goodness-of-fit tests seem to be less appropriate for deciding whether a model is a good imputation model.
- An extended comparison between MI with mixture models and other MI approaches is lacking. In order to assess the usefulness of our approach, it is important to investigate in which situations it performs better than possible alternatives, such as MICE and hot deck imputation.
- As most of the work on MI, the article by Vermunt et al. (2008) dealt with imputation of data sets containing independent observations. However, many studies in the social and behavioural sciences use designs yielding dependent observations, examples of which are studies using multilevel designs and longitudinal designs. A fourth aim of this project is to develop mixture MI models for dealing with such complex designs.
Besides addressing these four topics, the project should yield software implementations so that the MI methodology becomes available for applied researchers. We aim for making SPSS macro’s available as freeware on the Internet.
Date of defence: 6 December 2013