Maarten Kampert

Kampert-Maarten-ULMathematical Institute
Leiden University

Voice: +31 71 527 7130
E-mail: Maarten Kampert
Personal academic webpagee

Project

Distance based analysis of (gen)omics data
In the disciplinary fields of (gen)omics, there is a large need for statistical methods that can handle a large number of correlated variables in multiple high-dimensional data sets simultaneously. In the proposed PhD research project, we will investigate to what extend we can contribute to the statistical toolbox for omics research by using a multivariate distance-based analysis approach that is based on the clustering approach implemented in COSA (clustering objects on subsets of attributes). The proposal contains a number of steps, leading to separate projects. In the first project, we will study the behavior of the existing COSA algorithm, especially with respect to the attribute weights that play a crucial role in the COSA algorithm. We expect this will lead to various ways to improve upon the existing algorithm, resulting in COSA-NOVA. The new program will include smoothing of the weights, using prior knowledge, compositional PCA of COSA weights, and various alternative regularization options applied to the COSA weights. Also, the new program will use parallelization,and include state-of-the-art visualization. In the second project, we will extend COSA in such a way that it can analyze multiple data sets simultaneously, using a semi-supervised statistical learning approach. We will call the objective MIMO-COSA, which stands for COSA with Multiple Input and Multiple Output data sets. Project 3 investigates yet another approach to COSA, which isCOSA applied to subspaces. In this approach, we combine projection to a lower-dimensional subspace (to make the analysis invariant under rotation of the attributes, the dimensions in high-dimensional space), and optimal scaling of the attributes in order to be able to deal with nominal and ordinal categorical data, and possible nonlinear relationships among the attributes. Last, Project 4 concentrates on the application of COSA on data from so-called systems biology. In this project we will fine-tune the MIMO-COSA algorithm (resulting from Project 2), hopefully leading to MIMOSA, designed for multiple input and multiple output systems analysis.

Supervisors
Prof. dr. J.J. Meulman (Leiden University)
Prof. dr. W.J. Heiser (Leiden University)

Financed by
IBM / SPSS Leiden

Period
1 December 2012 – 1 December 2018