Prof. C.A.W. Glas, Dr A.A. Béguin
On October 29th 2015, Khurren Jehangir defended his thesis entitled
The use of item response theory for scaling in educational surveys
This project focuses on the application of item response theory (IRT) in the context of large-scale international educational surveys, such as PISA, TIMSS, CIVICS and PEARLS. Although IRT methodology has been widely used in educational applications such as test construction, norming of examinations, detection of item bias, and computerized adaptive testing, large scale education surveys present a number of specific problems. A number of these problems are addressed in the present proposal.
The first problem relates to the detection of cultural bias over countries. Statistical tests to detect item bias are available, but the sheer numbers of students (over 10.000) and countries (between 30 and 70) present feasibility problems related to the power of the tests and the presentation of the tests results, which has to be concise and meaningful. Therefore, test statistics will probably need to be redefined and functions for these statistic need to be defined that give information with respect to the seriousness of model violations in relation to the inferences that need to be made.
The second problem relates to modeling of item bias. One of the possibilities in this respect that will be investigated is modeling item bias by adding country-specific item parameters or item parameters which are random over the countries. A related problem is the definition of test statistics which support the appropriateness the bias model.
The third problem relates to the combination of the results of IRT measurement models with multilevel structural models that relate cognitive outcomes with background variables. Several procedures are available (concurrent and two-step procedures, maximum likelihood, Bayesian procedures and plausible value imputation). A study will be made of the relative merits and disadvantages of these methods. The fourth problem relates to linking surveys, predominantly over cycles within a survey, but possibly also between surveys. The possibility of linking arises because a survey as PISA retained a number of cognitive items and background questions over the cycles (2000, 2003, 2006 and 2009). The possibility of linking over surveys may be supported by such occasions as common items and questions or a common framework. In the latter case, a dedicated linking design may be called for. The psychometric problems related to these forms of linking, both pertaining to the measurement model and the structural model will be investigated.
The supervisors of this research project are involved in a consortium led by Cito to implement Core B (background questionnaires) of the fourth cycle of the PISA by OECD. The proposed methods will be evaluated using examples of the various PISA cycles. However, the method will also be evaluated using data from the TIMSS project, and using data from national assessments as PPON and NAEP.
This project was financed by Twente University