Coordinator | Bennett A.R. Kleinberg |
Venue / Dates | Tilburg University, October 2025 (dates to be announced) |
Mandatory/Elective | Elective |
ECTS | 1 |
Registration | secretariaat&iops.nl Deadline registration: September 15, 2025 |
Fee | IOPS-members: free |
Abstract | The past ten years have seen landmark advances in natural language processing and an increasing interest in the use of textual data in psychological and social science research. Textual data such as social media posts, narratives, patient records and free-text survey responses hold a promising potential for the measurement of constructs and for predictive models (e.g., to predict patient progression from diary data). At the core of current (generative) language models are fundamental statistical natural language processing techniques used to study textual data in a quantitative statistical manner. Early approaches of text representations largely relied on term frequency models. A shift to large-scale data (e.g., billions of documents) and the adoption of machine learning have enabled embedding representations that capture context and semantics of text data – a tectonic shift in how textual data are studied, represented and used computationally. This course will provide a conceptual, theoretical and statistical basis for modern natural language processing techniques and will provide the understanding needed for advanced natural language processing and generative (artificial intelligence) models such as large language models. The course will make participants confident in handling textual data and apply the concepts learned in R. Programme outline: Part 1: Foundations of textual data and text representations (theory and practice) Part 2: The world of text embeddings (theory and practice) Part 3: Natural language processing, machine learning and generative language models (theory and practice) |
Examination | Project |