On October 25, 2019, Laura Boeschoten defended her thesis Consistent estimates for categorical data based on a mix of administrative data sources and surveys.


When producing official statistics, Statistics Netherlands (CBS) uses existing administrative sources as much as possible. However, sometimes there is interest in a statistic on a subject that is not measured in these sources. In that case, the information is obtained through surveys.

Both administrative sources and surveys are not perfect and contain all kinds of measurement errors. This dissertation introduces a method that simultaneously tackles various problems related to those measurement errors.

First, the quality of the various sources is estimated. This is done on the one hand by investigating inconsistencies between variables that measure the same concept, but that originate from other sources. On the other hand, improbable or impossible combinations of scores on different variables are examined. For example, the combination of “age = younger than 5 years” and “marital status = married” is not possible because this is prohibited by law.

Secondly, statistics are produced that are corrected for the estimated measurement error. These produced statistics are consistent. This means that when a crosstab is produced between the variables “education level X gender X region “, and also a crosstab “education level X gender X marital status”, that, for example, the total number of highly educated men in both cross tables is exactly equal. In addition, the statistics are provided with variance estimates incorporate uncertainty due to the missing and conflicting values ​​in the original sources.

Prof. J.K. Vermunt, Prof. A.G. de Waal &  Dr D.L. Oberski

Financed by
Tilburg University and Statistics Netherlands