Aditi Manoj Bhangale

Institute of Psychology
Methodology & Statistics
Social and Behavioural Sciences
Leiden University

Email
Website

Project
Latent variable models for prediction

In the recent years, there has been increased focus on prediction in psychological research (Breiman, 2001; Shmueli, 2010; Yarkoni & Westfall, 2017). The goal of predictive modeling is to use model parameters estimated from one data sample to generate predictions for new or future observations (De Rooij et al., 2022). Until recently, prediction mechanisms were limited to machine learning methods or traditional models within the generalised linear modeling (GLM) framework. These models assume that all measured variables are measured with zero error (Fox, 2016). For example, a multiple regression model may be used to predict the future sales of a certain item based on previous years’ sales and advertisement revenue. In this example, one can explicitly count the exact amount of sales and advertisement revenue. In other words, the predictor variables in this model are assumed to be measured without error.

The above assumption cannot be made in most psychological research, wherein constructs (e.g., depression or anxiety) are often assumed to be imperfectly captured by the instruments used to measure them. Latent variable models (LVMs), such as structural equation models (SEMs), overcome this issue by explicitly incorporating measurement error within the model. The SEM assumes that all measured variables (indicators) are noisy indicators for some unobserved (latent) variable they are intended to measure. In addition, unlike traditional linear models, the SEM imposes restrictions on the covariance matrix between indicators based on prior knowledge or expectation of which latent variables underlie certain subsets of indicators. These measurement models are combined with structural (or regression) models, resulting in an overarching model that can estimate linear relations between latent and observed variables while accounting for measurement error.

SEMs have primarily been used for explanatory modeling. But devising prediction rules for SEMs has great potential as it allows researchers to extend the prediction framework of linear models to include measurement models of latent constructs. To this end, De Rooij et al. (2022) developed a prediction rule for making out-of-sample predictions using SEMs for continuous indicators. In this approach, the estimates of an SEM fit to one sample (called the generative model) are used to obtain a predictive distribution of all outcomes given their predictors. Essentially, the predictive distribution summarises the relations between the observed predictors and outcomes such that estimates of new outcome observations can be obtained if corresponding new values of the same predictors are available.

De Rooij et al. (2022) proposes a prediction rule for normally distributed, continuous data. However, it is still not possible to derive out-of-sample estimates for other popular SEMs, such as categorical or multilevel SEMs. The goal of this project is to extend the SEM-based prediction rule and develop prediction mechanisms for commonly used LVMs, and is divided into five projects:
• Project 1: Combines the SEM and ridge regression to compute SEM-informed ridge regression estimates of new observations,
• Project 2: Aims to develop an SEM-based prediction rule for categorical indicators of continuous latent factors (categorical SEMs),
• Project 3: Extends De Rooij et al.’s (2022) prediction rule to multilevel SEMs, and
• Project 4: Focuses on out-of-sample prediction for latent class models

Supervisors
Prof. Dr. Mark De Rooij
Dr. Zsuzsa Bakk
Dr. Julian Karch

Financed by
Internally funded at Leiden University (starter grants of Dr. Zsuzsa Bakk and Dr. Julian D. Karch)

Period
1 August 2024 – 1 July 2028