Depril, Dirk

Additive clustering for two-mode data


Dirk Depril PhD

Project at: Department of Psychology, K.U.Leuven, Belgium

Supervisors: Prof. dr Jan Beirlant, prof. dr Iven Van Mechelen

Project running from: 1 October 2004 – 15 February 2009

Two-way two-mode object by variable data often show up in statistical practice. In several contexts it may be desirable to obtain a possibly overlapping clustering of one of the modes implied by such data. For this purpose a one-mode additive clustering model has been proposed in the literature, which implies a decomposition of the data into a binary object by cluster membership matrix and a real-valued cluster by variable profile matrix. The reconstructed data values for each object are then obtained as summations of the profiles of the clusters the object belongs to.
he initial goal of the doctoral project is two-fold: first, with respect to the mathematical properties of the model, the minimal number of clusters needed to decompose a given model matrix has to be determined and conditions under which this decomposition is unique (upon permutation of the clusters) are to be identified. Second, algorithms need to be developed to fit the optimal model to an empirical data set, optimality being defined in the least squares sense. For this purpose a sequential fitting (SEFIT) algorithm has already been proposed by Mirkin, but information on its performance is lacking. In the present project SEFIT will be evaluated and if necessary new algorithms will be developed and compared to SEFIT on both simulated and benchmark real life data.

Subsequently, the research will be extended with algorithmic work for the following types of models: (1) additive clustering of three-way two-mode data (INDCLUS model), (2) two-mode additive clustering, (3) hybrid models that combine a discrete clustering of one mode with a dimensional reduction of the other mode, (4) two-mode clustering with heterogeneous biclusters.

Date of defence: 8 May 2009

Title of thesis: Algorithms for additive overlapping clustering