Maarten Marsman

Central Institute for Educational Measurement (CITO)
Arnhem, The Netherlands

Supervisors:
– prof.dr. C.A.W. Glas (Twente University)
– prof.dr. Karel Brookhuis (University of Groningen)
– dr. M.J.H. van Onna (Twente University)

On November 19th 2014, Maarten Marsman defended his thesis entitled

Plausible Values in Statistical Inference

Summary
A Plausible Value is a draw from the posterior ability distribution of a pupil given the pupil’s responses of items on a test and any additional information that is available from the pupil (i.e. gender, age, etc.). Plausible Values are commonly used in educational surveys as a tool for the secondary analyses performed by researchers that lack the sophisticated resources to estimate the latent regression models (i.e. regress the latent abilities on the covariates). The Plausible Values can also be used as dependent variables in the regression model, turning a latent regression problem into a manifest regression problem that can be estimated in standard statistical software.

The regression model estimated with Plausible Values has been estimated before, however, since it is used to make Plausible Values (it is the prior distribution for ability). But why are Plausible Values needed then? This question is answered in Chapter 2. First, it is shown that the marginal distribution of Plausible Values converges monotonically to the true distribution of ability. From this result one can show that the marginal distribution of Plausible Values is a better estimator of this true ability distribution than the latent regression model (properly seen as a prior), except when the latent regression model is equal to the true ability distribution.

New simulation methods to efficiently sample Plausible Values in large-scale settings are proposed in Chapters 3 and 4. A rejection algorithm is described in Chapter 3 and several Metropolis-Hastings algorithms are described in Chapters 3 and 4. A convenient feature of the proposed algorithms is that they are well suited for parallel computation and that they scale (i.e. become more efficient when applied to larger datasets), whereas certain existing methods become less efficient when the amount of observations increases, as shown in Chapter 5.

The methodology that is the topic of this thesis is highly practical and small pieces of R-code are provided in the appendices to help those who wish to try if the proposed methods work. In some cases, an appendix provides material that was simply too much fun to leave out.

This thesis was written with large scale educational surveys in mind, focusing on latent regression models using item response theory (IRT) to model the distribution of item responses, although Chapter 4 is also about Ising network models. The results contained in this thesis, however, readily generalize to other applications and models.

Project

Simulator-based automatic assessment of driving performance
The purpose of this PhD project is to design a reliable and valid automatic performance scoring system for a simulator based test for driving.

In order to design a simulator test, apart from optimizing the technical or virtual presentation of the scenario’s in the simulator, several statistical and methodological problems have to be tackled. First, because performance in the simulator cannot be automatically scored yet, assessors have to be used to obtain evaluation of pupil driver behaviour. A cognitive model is developed at TNO that learns the relation between ratings of assessors and registered objective performance measures by the simulator. Since the quality of the cognitive model is dependent on the quality of the information provided by assessors, a sound IRT-based measurement model for the assessors’ data has to be developed to feed the cognitive model with optimal information.

The output of the cognitive model will be used to select objective measures which are good predictors of the judgements of the assessors. Then a compound IRT model will be designed where one element is the IRT-based measurement model for the assessor judgements and the other an IRT model for assessment based on the selected predictors.

When the test has been designed and the models have been developed and validated, two projects remain. First, a cross-sectional study will be performed to create norm distributions for groups defined as beginning pupil drivers, advanced pupil drivers, license candidates, drivers one year post-licences, and very experienced drivers. Second, the assessors’ and simulator assessment scores will be correlated with additional measurements of supposedly related cognitive processes involved in driving, in particular in-car performance assessments, self-evaluation of driving competence and the Cito Drive computer based tests of responsible driving.

This project was financed by Cito/RCEC.