Sanne Peereboom

Methodology and Statistics
Social and Behavioral Sciences
Tilburg University

Assessing the artificial mind through natural language processing and psychometrics

In an era dominated by AI tools like ChatGPT, large language models (LLMs) underlying these AI tools have become integral to modern society, making the understanding of the intricacies of their inner workings essential. Trained on terabytes of text data and ever increasing in size, these models contain billions of parameters, allowing them to generate text that is often virtually indistinguishable from human-written text. However, the complexity and sheer size of these models renders analytical assessment of their behavior a challenging task. The difficulty of implementing a purely analytic technique to measure model behavior necessitates interdisciplinary approaches. A behaviorist perspective is argued to be especially beneficial, given their expertise in the systematic measurement and analysis of behavior (Rahwan et al., 2019).

This study ventures into the novel field of assessing the artificial mind through psychometric assessments. This approach could provide valuable insights into LLM “thought patterns”, possibly capturing subtleties that might otherwise be missed by traditional AI benchmark and assessment methods. The objective of this research is to assess the feasibility of applying these assessments (traditionally reserved for measuring human cognitions, emotions, and behaviors) to LLMs. A key question is whether these assessments can accurately capture the inner mechanisms involved in specific psychological constructs within LLMs. Understanding if and how these constructs in LLMs differ from humans is equally important – not only could this reveal some of the intricacies of AI cognition, it could also provide guidance in refining model behavior.

Recent research has made strides in this direction. To name a few examples, Hagendorff (2023) describes the use of LLMs as participants in a psychological experiment, coining the term “machine psychology” and describing various existing studies across a range of subdisciplines of psychology that take this approach. Pellert et al. (2023) explicitly refer to the term “AI psychometrics”, using common psychometric assessments to measure psychological profiles in a number of LLMs. Argyle et al. (2022) even argue that LLMs can be used as a “silicon sample”, where generated LLM responses could eventually replace human responses in social science research.

However, the field is still nascent and faces significant challenges. For example, there is a notable absence of standardized methodologies and validity assessments of psychometric scales for LLMs. Systematic evaluations of LLM responses to these scales are also still uncommon, which has implications for the conclusions one may draw from their results. Addressing these challenges, this research is intended to adhere to high standards of psychological research methodology. We aim to establish a systematic methodological framework for the administration of psychometric assessments to LLMs, so that we may draw conclusions from their responses in a more valid and reliable manner. This strategy is crucial to proper measurement of AI behavior by means of psychometric methods.

Dr. B. Kleinberg
Dr. I. Schwabe

2023 – 2027