Methodologie van het Pedagogisch Onderzoek

Faculty of Psychology and Educational Sciences

KU Leuven

**Supervisors
**Prof. Eva Ceulemans & Prof. Wim Van den Noortgate

On September 26th, 2017, Marlies Vervloet will defend her thesis entitled

**Unraveling and unlocking the assets of principal covariates regression**

**Summary**

In the behavioral sciences, researchers often link a criterion to multiple predictors, using multiple linear regression. In almost all cases the main aim of the analysis is to obtain a better understanding of the unique and shared relations between the predictors and the criterion. A complication that is often encountered is that the set of potential predictors is rather large. This raises an interpretational burden, because the regression weights only reflect the unique effects of the predictors on the criterion and shed no light on shared effects. Moreover, the more predictors, the more chances increase that at least some of the predictors will be highly correlated with a linear combination of the other predictors. This so-called multicollinearity phenomenon is problematic because it leads to unstable regression weights. A promising but overlooked method that was presented in chemometrics to deal with these complications is principal covariates regression (PCovR).

PCovR tackles the complications through a dimension reduction approach. It captures the main information in the predictors in a limited number of summarizing variables, called components. Simultaneously, PCovR uses these components to predict the criterion. The most important assets of PCovR are that this simultaneous optimization of reduction and prediction always has a closed form solution, and that users can choose to which degree reduction and prediction are emphasized through a weighting parameter. Nevertheless, PCovR is not often used in the behavioral sciences, because of some remaining obstacles, which we attempt to clear in this doctoral dissertation.

In **Chapter 2**, we zoom in on the weighting parameter. We report the results of a literature study and an extensive simulation study with regard to how to tune this parameter. Model selection in PCovR, however, does not only consist of selecting the weighting parameter value, but also of selecting the number of components. We propose four model selection strategies in **Chapter 3** and put the performance of these strategies to the test in two simulation studies. Moreover, we compare the obtained PCovR solution to those that result from two more popular dimension-reduction-based techniques: partial least squares (PLS) and principal components regression (PCR), showing that PCovR outperforms the other two in recovering data generating components that explain variance in the criterion.

**Chapter 4** compares the performance of PCovR and exploratory structural equation modeling (ESEM). ESEM is a factor-analysis-based method that can be used to estimate PCovR-like models. Finally, in **Chapter 5**, we present the R package **PCovR**. This package allows users to perform all PCovR analysis steps: preprocessing the data, parameter estimation, model selection, and rotating the retained solution for easier interpretation.

**Project**

**Model construction in (multilevel) regression analysis**

Multilevel regression analysis is one of the most popular techniques in educational research. It is used to relate a set of predictors to a criterion, when the observations have a nested structure (e.g., pupils nested into classes). One of the major challenges is how one should construct an appropriate model: which effects are random and which fixed, how to avoid multicollinearity problems, …? One of the goals of this project is to propose a new model construction strategy, called multilevel covariates regression. Building on the key principle of Principal Covariates Regression (PCovR; De Jong & Kiers, 1992), this strategy boils down to summarizing the main information in the predictor variables by reducing them to a few components in such a way that the criterion scores can be optimally reconstructed. There are, however, still some gaps that need to be filled concerning the PCovR method. Firstly, it includes a weighting parameter that allows one to emphasize the reconstruction of the predictors or rather the prediction of the criterion, but it is unknown how the weighting parameter influences the performance of the method and how an appropriate value should be selected. Secondly, the PCovR code is not yet available in a non-commercial software program. Thirdly, it is not known how PCovR compares to Exploratory Structural Equation Modeling, which is a similar, but stochastic approach. After clearing out these issues, multilevel covariates regression models as well as associated algorithms will be developed and simulation studies will be set up to evaluate their performance.

**Financed by
**KU Leuven