Methodology and Statistics
Tilburg School of Social and Behavioral Sciences
Prof.dr. J.M. Wicherts and Prof.dr. J.K. Vermunt
On March 7th, 2018, Paulette Flore will defend her thesis entitled
Stereotype Threat and Differential Item Functioning: A critical Assessment
Do gender stereotypes lead to performance decrement on math tests for girls or women? Psychologists across the world have tried to answer this question using experiments for the last two decades. In these experiments a group of students is exposed to stereotype threat before making a math test. Stereotype threat can be made salient in different ways, for instance by informing participants that “boys and girls do not perform equally well on this math test”. In a control condition a second group of students do not get to read this, or they are informed that “boys and girls perform equally well on this math test”. Female students often underperform on a math test when they are exposed to stereotype threat, while male students are not influenced.
In my dissertation we study stereotype threat literature and popular research methods with a critical eye. We need to be critical, because some problems in the psychological literature could have distorted research findings in the past, like publication bias (results are biased by selectively publishing studies with exciting results), and a lack of replicability (being able to replicate the findings of the original study by means of a new study) and reproducibility (coming to the same conclusions as the original researchers by reanalyzing the existing dataset). Moreover, stereotype threat researchers mostly study whether performance decrements on the math test occur on average scores. In my dissertation I go beyond averages, and study group differences caused by stereotype threat for specific math questions. With statistical models we study whether girls influenced by stereotype threat score lower on specific math questions than girls in the control condition (controlled for math ability), we call this Differential Item Functioning (DIF).
In Chapter 2 of my dissertation we summarize existing stereotype threat studies conducted in elementary, middle and high schools across the globe by means of a meta-analysis. We found a negative influence of stereotype threat on math performance, even though the differences between the groups were small. Tests for publication bias implied that the results are somewhat distorted due to selective publishing. In Chapter 3 we carried out a large stereotype threat replication study in Dutch high schools. More than 2,000 students participated in this study. We did not find evidence for a stereotype threat effect on math performance in this study. In Chapter 4 we study used DIF methods and reporting practices in 200 articles. We conclude that the amount of detail in reports on DIF analyses is often insufficient, which is problematic for reproducibility. It is striking that researchers who study DIF with multiple statistical methods, often find divergent results. Finally, in Chapter 5 we reanalyze data of 10 stereotype threat experiments. We found no systematic differences in stereotype threat effects for difficult or easy questions. The amount of unanswered math questions was high in some of the studies, which reflects the strong time pressure students had to work under. We suggest as alternative explanation for performance decrements that female students in the stereotype threat condition work slower or give up more easily than female students in the control condition. A DIF analysis on our own dataset does not show any differences in performance on specific items for the female students in the different experimental groups. We recommend researchers and policy makers to be critical when interpreting outcomes in stereotype threat and DIF literature. In the future, large scale systematic replication studies could answer many of the pending questions regarding the stereotype threat effect.
The psychometrics of stereotype threat
Stereotype threat refers to a form of test anxiety that is attributed to the testee’s fear to inadvertently confirm a negative stereotype concerning his or her group (e.g., “females can’t do mathematics”). Based on theory and empirical results, we develop an advanced item response model for studying this effect as a source of differential item functioning in mathematics tests. The forma! model enables comprehensive tests of characteristics that render items and persons susceptible to the effects of stereotype threat. The new model is applied to experimental data and test data from CITO. Results contribute to substantive theory and test development.
NWO Talent Grant