Research Group of Quantitative Psychology and Individual Differences
Faculty of Psychology an Educational Sciences
Personal webpage Richard Artner
On 25 March 2022, Richard Artner defended his thesis Reproducibility, robustness, and test severity: New insights for research workers in psychology at the University of Leuven.
This dissertation provides new insights into three paramount topics of psychological research: reproducibility, robustness, and severity. A statistical result is reproducible if independent researchers with adequate skills can verify its correctness through the identification and execution of the underlying calculations on the same dataset. Lacking a culture of data sharing and re-analysis very little is known in the field of psychology about the amount of reporting errors in scientific publications nor the amount of effort it takes to verify other researchers’ empirical work. To shed light on this topic, Chapter 1 reports on the results of a comprehensive study that investigated the reproducibility of the major statistical conclusions drawn in 46 papers published in 2012 in three APA journals. Through an in-depth analysis of our reproduction efforts and the types of mistakes found, we develop a new taxonomy for reproducibility, give practical recommendations on how to achieve reproducibility, and discuss the challenges of large-scale reproducibility checks as well as promising ideas that could considerably increase the reproducibility of psychological research. Preferably, statistical findings are not just reproducible but also robust with respect to reasonable alternative data analyses. To assess the robustness of an empirical finding, Simonsohn et al. (2020) proposed specification curve analysis (SCA), a creative statistical method that combines the inferences of all reasonable and feasible ways to analyze a set of raw data with respect to some scientific hypothesis. In Chapter 2, we first reiterate the essentials of this method and discuss theoretical issues. Afterwards, we show through extensive simulations that the SCA procedure often produces invalid p-values. Furthermore, we find that simpler alternatives outperform the SCA procedure in all studied scenarios. Chapter 3 is the serendipitous result of a research project that required to simulate data from models that are specified via partial correlation structures. In this chapter, we derive necessary and sufficient conditions for the eigenvalues of differently defined partial correlation matrices so that the correlation structure is a valid one. Equipped with these conditions, we derive simple conditions on the partial correlations for frequently assumed sparse structures. Further, we show that valid partial correlation matrices can be created via a simple formula that can be used in conjunction with existing algorithms for the generation and approximation of correlation matrices. Having found reproducible and robust empirical patterns, psychological researchers often want to know what this implies for the scientific hypothesis of interest. From a Popperian perspective, the corroboration status of a scientific hypothesis should be changed to the extent that it was able to pass a severe test, that is, to the extent that it was used to correctly predict something risky. In Chapter 4, we propose a data-driven method that allows the quantification of the riskiness of an empirical prediction that was derived from a substantive hypothesis of interest. We demonstrate this approach with an empirical investigation on the relation between watching TV in early childhood and later inability to concentrate well.
Prof. dr. F. Tuerlinckx, Dr. W. Vanpaemel
1 October 2017 – 25 March 2022