Sebastian Lortz

Nieuwenhuis Institute for Educational Research
Department Child and Family Welfare
Faculty of Behavioral and Social Sciences
University of Groningen

Email
Website

Project
Models for Computer Adaptive Testing of Cognitive Functions

Cancer survivors often experience cognitive impairment after cancer and its treatment, affecting 75% of patients with brain tumours and 30% of those with other cancers (Kim et al., 2023; Gibson & Monje, 2021; Schagen et al., 2022; Noll et al., 2019; Mayo et al., 2021). Cognitive impairment impacts daily functioning, and its assessment is essential because it clarifies the problem’s nature and directs treatment planning (Noll et al., 2018). However, diagnoses of cognitive impairment in cancer survivors are challenging due to the multifactorial aetiology, the variability of symptoms, and the potential impact of both physical and psychosocial comorbidities on cognition (ECCC et al., 2022). Thus, gold-standard diagnoses rely on a comprehensive, time-intensive neuropsychological evaluation (in-person testing and self-report). However, as the number of cancer survivors is expected to rise to nearly 1.4 million in the Netherlands by 2035 (IARC, n.d.; IKNL, 2019), 3-4 hour neuropsychological testing poses a capacity bottleneck.

To address these challenges, our project team (investigators listed on the title page) will develop a home-based and time-efficient screening instrument using computerised adaptive testing (CAT; De Ayala, 2013). We aim for a brief and self-administered instrument that minimizes burden by reducing unnecessary items for unimpaired patients and that prioritizes high sensitivity to avoid missed clinical cases. The CAT test results will be shared with the treating clinician, who will then decide (based on the test scores) whether the patient needs to be invited for a gold-standard neuropsychological evaluation or is referred to self-management sessions. This streamlines triage, speeds decisions, and concentrates specialist time on the right patients.

In a CAT, an algorithm selects each question in real time based on prior responses (van der Linden, 2016; Thomas, 2019), choosing the most informative next item from IRT-calibrated item banks (Wainer, 2000; van der Linden & Glas, 2010). This adaptiveness increases information density, cutting test length by 50-80% without loss of sensitivity (Harrison et al., 2017). Efficiency improves further (i.e., shorter test length at the same accuracy) when multiple domains are measured in one CAT (Paap et al., 2018, 2019). In sum, CAT provides an efficient, accurate, and scalable test framework for screening solutions.

Currently, no oncology-specific cognitive screening instrument exists. Existing cognitive screening tests (see e.g., De Roeck et al., 2019; Tsoi et al., 2015) generally perform poorly in cancer populations (Koppelmans et al., 2012; Meyers & Wefel, 2003; Racine et al., 2015; Sun et al., 2011). Available tests are insufficiently challenging and largely memory-focused, overlooking attention, processing speed, and executive function (Kim et al., 2023). While calls for the application of CATs are increasing, CAT remains underused in neuropsychology (Bilder & Reise, 2019; Loring et al., 2022; Di Sandro et al., 2024). Part of the reason seems content-related: traditional neuropsychological tests have not been designed for adaptive use, so neuropsychologists face the challenge of redesigning them to be CAT-compatible. Part of the gap is methodological: traditional tests are not readily compatible with the most widely used IRT models (e.g., the two-parameter logistic model). Because accuracy and response time cannot be disentangled in existing instruments, we need new strategies to tailor or combine advanced modelling techniques that deal with these kinds of data. Our multi-disciplinary project team, bridging oncology, clinical neuropsychology, and psychometrics, is well-positioned to overcome these challenges and develop a neuroCAT that substantially advances care, resulting in methodological innovations.

This PhD project comprises two parallel workstreams in collaboration with the Netherlands Cancer Institute (NKI) and the software provider Eyra: (A) the project team (investigators listed on the title page) will develop and implement an adaptive clinical screening system, and (B) the psychometrics team (i.e., my supervisors and I) will advance psychometric research by addressing core test development challenges.

In workstream (A), we will build item banks (per sub-test) consisting of a large pool of items spanning a wide range of difficulty levels, and we will estimate the banks’ item parameters using empirical data. We address the challenges described above by adapting and evaluating the most relevant IRT models (continuous abilities) and/or cognitive diagnostic models (discontinuous abilities). We will incorporate auxiliary information via response time (e.g., joint hierarchical modelling (JHM); van der Linden, 2007) and via previously completed tests (earlier ability estimates inform subsequent tests) to improve estimation precision. Since CAT performance depends on how items are chosen and when and how the test ends, we will adopt an item-selection approach and stopping rule that balances precision and participant burden. The CAT’s performance will be evaluated by primary outcomes (e.g., estimation error, classification accuracy) and secondary outcomes (e.g., test-length reduction). We will assess these outcomes through both simulations (i.e., comparing the CAT to full-item batteries) and empirical studies (i.e., comparing the CAT with the gold-standard assessment). Once the CAT development is completed, Eyra will deploy the CAT as a browser-based application for large-scale use.

In workstream (B), research will be conducted addressing the challenges identified in (A), and the two streams co-evolve as choices in one inform the other. First, we will determine the minimum sample size required to calibrate item banks with the JHM accurately and, in turn, achieve stable CAT performance. While CAT literature recommends 500-1000 participants for item bank calibration (Paap et al., 2018), there is little guidance for calibration with the JHM. We will also examine the minimum sample sizes needed for criterion validation studies that compare CAT classifications with a gold-standard test. Besides that, we will develop time-aware item-selection strategies for a JHM-based CAT that chooses each next item by its expected accuracy gain per unit testing time. We will then compare two adaptive approaches: an estimation CAT that estimates abilities, then applies a cutoff; and a classification CAT that selects items and stops specifically to make the pass/fail call at that cutoff. Finally, we will evaluate and refine strategies for combining multiple sub-test outcomes into a single, clinician-actionable decision within one adaptive test.

Taken together, I will (A) contribute to delivering a home-based neuroCAT that focuses specialist time where it’s needed, and (B) I will advance psychometrics by establishing generalizable solutions for efficient, reliable, and clinically actionable adaptive testing.

Supervisors
Dr. Muirne Paap
Dr. Niek Frans
Prof. Dr. Sanne Schagen
Dr, Joost Agelink van Rentergem

Financed by
KWF

Period
1 September 2025 – 31 August 2029