On Hierarchical Structures in the Multi-Armed Bandit Problem
In the canonical multi-armed bandit (MAB) problem a gambler stands in front of a row of slot machines, each with a (potentially) different payoff. It is up to the gambler to decide in sequence which machine to play and, during the course of sequentially playing the machines, she aims to make as much profit as possible by simultaneously learning from the previous observations and using the gained knowledge to steer future actions. The gambler needs to pick a strategy that dictates which arm to play next given the previous observations. The strategies that have been developed over the years have found numerous practical applications.
Noticeably, a very large number of (field) experiments in the social sciences can be formalized as a bandit problem (for example the random clinical trial). In the social sciences, we often encounter hierarchical structures (e.g., observations within individuals). Surprisingly, the performance of the MAB strategies when such hierarchical dependencies are present has hardly been studied. In traditional statistics, however, there is plenty of research on hierarchical models. We want to focus our effort towards contributing to the multi-armed bandit problem research by developing strategies that take hierarchical dependencies into account.
Dr Maurits Kaptein & Prof. Jeroen Vermunt
Tilburg University, MTO
1 September 2016 – 31 August 2020