Eva Zijlmans

Methodology and Statistics
Tilburg School of Social and Behavioral Sciences
Tilburg University

Phone: +31 13 466 2610
Email Eva Zijlmans
Personal academic webpage Eva Zijlmans

On February 15, 2019, Eva Zijlmans defended her thesis, titled:

Item-Score Reliability – Estimation and Evaluation

In psychology and education, tests are used to measure intelligence and school
performance. Test scores are used to make decisions about individuals (who is
admitted to a particular school level or a job?) and have impact on people’s lives as
well as on schools and organizations. Thus, test scores must be reliable to guarantee
that decisions based on test scores are correct. Reliability is the degree to which retesting
a person provides the same result. In practice, re-testing the same persons to
determine reliability is unrealistic, because memory and other unwanted effects will
influence the test result. Estimation of a test score’s reliability therefore is based on
the test results of a sample of people who took the test just once. This approach has
produced several methods to estimate reliability of the test score.

Methods for estimating the reliability of a test score all relate to a test consisting of
multiple items (problems to be solved, questions to be answered). However,
individual items also must have high reliability, and thus it is important to assess the
reliability of a single item, that is, the item-score reliability. So far, items were
assessed using indices that address aspects of item quality other than reliability, but
methods to assess item-score reliability were hardly available and thus had to be
developed and their performance evaluated. This was the topic of this dissertation.

In this dissertation, methods for estimating item-score reliability were developed and
the usability of these methods was evaluated. First, reliability methods based on test
scores were used as a basis for developing methods for estimating item-score
reliability. These methods were evaluated in controlled studies using simulated data.
Three promising methods resulted. In a second study, these three item-score
reliability methods were used to estimate the item-score reliability in several
empirical-data sets. The resulting values were compared to values of item indices
assessing other aspects of item quality. The relation between the three item-score
reliability methods and the other item indices was investigated in a third study using
simulated data. In a final study, the usability of item-score reliability for selecting or
rejecting items based on their contribution to test-score reliability was investigated.

The studies in this dissertation show that item-score reliability methods provide
insight into the quality of an item and help to decide whether an item should be
included in the test. Also, the relationship between item-score reliability and other
aspects of item quality is investigated. Our methods may contribute to the
improvement of psychological and educational tests.

Supervisors
Prof. Dr. K. Sijtsma, Dr. J. Tijmstra & Dr. L.A. van der Ark

Financed by
Tilburg University