Computational Evaluation for Dark Data Science
Multiple imputation (Rubin, 1976) is a state-of-the-art technique for drawing valid conclusions from incomplete data, and is increasingly used in incomplete data analysis in many substantive fields. The R package mice (van Buuren & Groothuis-Oudshoorn, 2011) has become the de facto standard for the statistical analysis of incomplete data.
With this project, we will connect applied researchers to statisticians by forming a bridge between the computational elements in computer science, data visualization, and inferential statistics. We do so by extending and improving the iterative multiple imputation framework such that applied researchers and advanced users can utilize it and can interpret its validity.
The aim is to develop shinymice: computer-assisted model building and evaluation for multiple imputation. shinymice is an interactive and dynamic model building and evaluation device for multiple imputation that guides users through model building and evaluation. Building imputation models and evaluating the drawn imputations in terms of statistical validity is challenging for people without a thorough training. shinymice solves this by aiding users in building their imputation and non-response models such that these models do not clash with their analysis model. After imputation, shinymice guides users in determining the plausibility of the imputations and, thus, the plausibility of the inference obtained on their substantive model of interest. shinymice is for everyone, but it is particularly needed for applied researchers and analysts who lack the required statistical background to independently attempt to draw valid inference through multiple imputation.
To achieve the project aims, novel methodology and data visualization tools will be investigated, evaluated, and implemented in shinymice. For example, there has not been a systematic study on how to evaluate the convergence of iterative imputation algorithms (van Buuren, 2018), even though all inferences rely on algorithmic convergence. Fortunately, there are non-convergence identifiers for other iterative algorithms (e.g., Vehtari et al., 2021), but the validity of these identifiers has not been systematically evaluated on imputation algorithms. One of the project milestones will thus be a comparison of different methods for the evaluation of algorithmic non-convergence in iterative imputation, resulting in a guide for practice. Other project deliverables are vignettes and/or tutorials geared towards empirical researchers, which translate existing methodological research into actionable advice and applications (e.g., for the imputation of multilevel data; Audigier et al., 2018).
In short, this project encompasses the development and evaluation of methods and techniques for the analysis of incomplete data, which is ubiquitous in the social and behavioral sciences. With that, I aim to contribute to the fields of psychometrics and sociometrics, and I would very much like to become part of IOPS.
Prof. dr. S. van Buuren
Dr. G. Vink
Januari 2023 – March 2026