Maarten Kampert

Kampert-Maarten-ULMathematical Institute
Leiden University

Voice: +31 71 527 7130
E-mail: Maarten Kampert

On July 3rd 2019, Maarten Kampert defended his thesis Improved Strategies for Distance Based Clustering of Objects on Subsets of Attributes in High-Dimensional Data
Leiden University, Methods and Statistics, at the university of Leiden

Summary
This monograph focuses on clustering of objects in high-dimensional data, given the restriction that the objects do not cluster on all the attributes, not even on a single subset of attributes, but often on different subsets of attributes in the data. With the objective to reveal such a clustering structure, Friedman and Meulman (2004) proposed a framework and a specific algorithm, called COSA. In this monograph we propose various improvements to the original COSA algorithm. The first improvement targets the optimization strategy for the tuning parameters in COSA. Further, a reformulation of the COSA criterion brings down the number of tuning parameters from two to one, enables incorporation of pre-specified initial weights for the attribute distances and allows for a solution that consists of zero-valued attribute weights. The third improvement consists of a new definition of the COSA distances that yields a better separation between objects from different clusters. We compared the `old’ and the improved COSA with other state of the art methods. The comparison is based on simulated and real omics data sets.

Supervisors
Prof. dr. J.J. Meulman (Leiden University)
Prof. dr. W.J. Heiser (Leiden University)

Financed by
IBM / SPSS Leiden

Period
1 December 2012 – 3 July 2019