Hannah Heister

Faculty of Behavioural and Social Sciences
Psychometrics and Statistics
University of Groningen


What is normal? Accurate norms and their use for psychological tests.

Psychological tests, like intelligence tests, are widely used in research and individual assessment, like for screening in clinical practice. Test scores are interpreted on the basis of normed scores, which express how normal (i.e., common) an individual’s performance is, compared to their reference population. Currently, norms are less accurate than possible, norm construction requires more effort than necessary, and norms are often not effectively communicated to test users. All this hampers proper and effective norm use. This is problematic, because it may slow scientific progress, and may lead to wrong individual decisions in assessments, with possibly serious negative consequences.
We will develop advanced norming methods that make the process of (1) scoring tests, (2) designing normative studies, and (3) interpreting test results more precise and simpler than with the currently available methods. We do so by developing (1) two item response theory-based norming methods and a multivariate norming method, (2) methods for optimal design and sequential sampling, and (3) a communication strategy for test norms to test users. Upon completion, the project will have resulted in a complete methodology for optimal norming: from designing a normative study to actually calculating the norms to the communication towards end users. This makes it possible to construct norms with the accuracy desired with the least effort possible. This facilitates the development, maintenance and use of high-quality, effective psychological tests. All this boosts a proper and effective test interpretation, which greatly aids test practice, for the benefit of both science and the individuals tested.

My part within the research group investigating that matter is (1) the development of IRT-based norming methods. For this I attempt to create a parametric and a non-parametric model as well as a multivariate norming method.
This is important as current norming practice yields needless inaccuracies, relating to a subtest’s reliability and validity because of two reasons. First, unweighted sum scores (i.e., added up item scores) fail to account for differences in item quality. This is unrealistic. For example, the Symptom Distress Scale (SDS) aims at measuring the trait distress. The item ‘Intensity of pain’ is of higher quality (i.e., measures distress better) than ‘Deterioration of appearance’ (Ramsay et al, 2020). Failing to account for these differences needlessly enlarges the inaccuracy in measuring the trait (Ramsay & Wiberg, 2017) and may result in unacceptable norms. A solution is to use item response theory (IRT; Embretson & Reise, 2000). This may increase the accuracy substantially. For example, for the SDS, IRT trait estimates appeared four times higher in accuracy than sum scores (Ramsay et al, 2020). This is a major improvement. However, currently, IRT is not suitable for continuous norming. It often appears impossible to find a well-fitting IRT model including covariate(s) like age, as is required for continuous IRT norming. Thus, we will develop new IRT tools to make continuous IRT norming possible. Second, norms based on a single subtest can be substantially less valid than those based on two (or more) subtests simultaneously. In test practice, a common situation is that an individual’s performances on two subtests 4/20 Full proposal form Open Competition SSH 2021 are within the normal range, while the combination of both is not. For example, on a neuropsychological test, a testee could be rather slow and frequently make errors. Either score is within the limits of normative performance, yet the combination is a ‘red flag’ for serious impairment. Another example is a testee with a great discrepancy between verbal and performal performance, which is common in autism. Such ‘red flags’ can only be detected by considering both subtests, expressed in a single norm. This is not done in practice yet. The current multivariate norming method (Huizenga et al., 2016) falls short, as it relies upon too stringent statistical assumptions (e.g., multivariate normality of subtest scores). Thus, to get valid norms by taking advantage of multiple subtests, and enhance the type and number of tests to which multivariate norming can be applied, we will develop multivariate norming methods founded on realistic assumptions.

Prof. dr. M. E. Timmerman
Prof. dr. C .J. Albers
Prof. dr. Marie Wiberg

Financed by

February 2023 – February 2027