**Project***Statistical /Psychometrical techniques for digital learning applications*

The last two decades, digital learning environments have become widespread in formal and informal education (Debeer et al., accepted). Moreover, the past year, Covid-19-related lock-down measures further boosted online learning activities, and demonstrated the importance of digital educational technologies when in-classroom learning became impossible. Yet even outside of these extreme conditions, digital learning environments have several advantages. First, they permit the direct and automated analysis of learner input and interactions, making it possible to assess learners’ progress, even without including explicit intermediate tests that could interrupt the learning process, and to provide learners and teachers with immediate and diagnostic feedback (Klinkenberg, 2011). Second, they allow the incorporation of game-based elements, which have been shown to trigger repeated practice and foster learning in young children (Griffith, et al., 2020). Finally, they can facilitate the personalization of the learning experience, by automatically presenting content that is adapted to the individual learner (i.e., adaptivity, Plass & Pawar, 2020). Thus, digital learning environments can accomplish the idea of deliberate practice (Ericsson, 2006) at a large scale, thereby allowing each learner to realize their full learning potential (Hofman et al., 2020).

Computerized Adaptive Practice (CAP) aims to fulfill the above described promises of digital learning. Typically focusing on skills that require repeated practice – such as early numerical and reading skills – CAP combines targeted practice with game-based elements in an attractive narrative (Debeer et al., accepted). Inspired by Computerized Adaptive Testing (CAT; van der Linden & Glas, 2000), CAP achieves individualized practice by selecting exercises (or items) dependent on the skill level of the learner (Wauters et al., 2010). Where CAT purely aims to optimize the skill level assessment by repeatedly presenting the most (statistically) informative items given previous performance, CAP focuses on stimulating the learning process. Items are selected to challenge but not demotivate the learner, and instructive feedback is provided immediately (Klinkenberg, 2011).

This project focuses on the psychometric models and algorithms that enable CAP, generally referred to as ratings systems. Unlike traditional CAT methods, which require expensive prior calibration of large sets of items, rating systems efficiently update the item (and person) ratings on-the-fly. After every person-item interaction, the parameters (or ratings) are dynamically adjusted while considering the difference between the observed response and what was expected given the current ratings. Efficient rating systems for CAP should meet several requirements. First, because learners are expected to improve moment by moment and item difficulties are known to drift over time (Bock, et al., 1988), rating systems should be able to dynamically track changes in skill level and item difficulty. At the same time, the assessments should be accurate and stable enough to uphold efficient individualized practice and thereby bolster learning. Second, rating systems should result in meaningful diagnostic feedback about the current skill level and about the progress over time, both to inform learners, teachers and parents, and to trigger interventions when slow learning is detected. Third, ratings systems should be adjustable to the particularities of different CAP environments, and efficiently make use of the available information by embodying scientific knowledge related to the practiced skills. Finally, the algorithms underlying the rating systems should be computationally efficient and scalable to massive simultaneous use.

About a decade ago, the Elo Rating System (ERS; Elo, 1978) – originally introduced as a rating system for chess player ratings – has been proposed to flexibly track skill levels and item difficulties in CAP (Brinkhuis & Maris, 2009). For binary outcomes (i.e., correct-incorrect) in learning environments, the ERS is based on the Rasch model (Van der Linden, 2018):

P(X_pi=1)= E(X_pi )= e^(θ*p -β_i )/(1+ e^(θ_p -β_i ) ) (1), where X_pi is the response of person p on item i, P(X_pi=1) is the probability of a correct response, and θ_p and β_iare the person’s skill and the item’s difficulty rating, respectively. The updating rules for person and item ratings are: θ*(p,t+1) =θ*(p,t) + K × (X*(pi,t)-E(X_(pi,t)))

(2),

`β_(p,t+1) =β_(p,t)+ K × (E(X_(pi,t) )-X_(pi,t) )`

(3).

In Equations 2 and 3, the update of the ratings from time t to time t+1 depends on the difference between the observed and expected response E(X_(pi,t) ), combined with an update weight K. This update weight K sets the maximum size of the update after every step, thereby specifying how fast the system can track changes in the skill level, and how strong ratings fluctuate.

Recently, the Urnings rating system was proposed as a promising alternative to the ERS (Bolsinova, et al., 2020, Hofman, et al., 2020). In Urnings, person and item ratings are represented by urns that contain a specific number of balls (where the number of balls is referred to as the size of the urn). The balls are either green or red, and the proportion of green balls in an urn corresponds to the current skill or item difficulty rating. Like in the ERS, ratings are updated after every response (by changing the color of one ball: red to green, or green to red). Hence, the size of the urn has the same function as the K-value in the ERS, as it determines the size of a rating update. For the mathematical details, we refer to Bolsinova, et al. (2020).

The ERS has proven to be an efficient ratings system for CAP. The past years, the ERS was successfully implemented in multiple digital learning environments, of which Math Garden is a well-known example (Klinkenberg et al., 2011). However, despite their success, ERS-based rating systems are not without issues. More specifically, with respect to the requirements that an efficient rating system should meet (cf. above), we see several challenges that will be addressed in this project.

**Supervisor**

Prof. dr. W. van den Noortgate

Prof. dr. H.L.J. van der Maas

Dr. M.A. Bolsinova

Dr. A.D. Hofman

**Financed by**

FWO

**Period**

September 2022 – September 2026