Jules Kruijswijk

Methodology and Statistics
Tilburg School of Social and Behavioral Sciences
Tilburg University

Project

On Hierarchical Structures in the Multi-Armed Bandit Problem
In the canonical multi-armed bandit (MAB) problem a gambler stands in front of a row of slot machines, each with a (potentially) different payoff. It is up to the gambler to decide in sequence which machine to play and, during the course of sequentially playing the machines, she aims to make as much profit as possible by simultaneously learning from the previous observations and using the gained knowledge to steer future actions. The gambler needs to pick a strategy that dictates which arm to play next given the previous observations. The strategies that have been developed over the years have found numerous practical applications.

Noticeably, a very large number of (field) experiments in the social sciences can be formalized as a bandit problem (for example the random clinical trial). In the social sciences, we often encounter hierarchical structures (e.g., observations within individuals). Surprisingly, the performance of the MAB strategies when such hierarchical dependencies are present has hardly been studied. In traditional statistics, however, there is plenty of research on hierarchical models. We want to focus our effort towards contributing to the multi-armed bandit problem research by developing strategies that take hierarchical dependencies into account.

Supervisors
Dr Maurits Kaptein & Prof. Jeroen Vermunt

Financed by
Tilburg University, MTO

Period
1 September 2016 – 19 February 2021