Schepers, Jan

Real-valued HICLAS models

Jan Schepers PhD

Project: K.U.Leuven, Belgium

Supervisors: Prof. dr Iven Van Mechelen

Project running from: 1 October 2003 – 30 September 2007

Hierarchical classes (HICLAS) models constitute a distinct family of classification models for N-way N-mode data that imply a simultaneous clustering of each of the modes in the data. The clusterings are such that each of them reflects a quasi-order between the elements of the corresponding data mode while together they yield reconstructed data that approximate the actual data as good as possible. Up to now, the family of hierarchical classes models has been limited in scope with respect to the range of values allowed in the data (D), and the reconstructed data (M), which can be taken either from the binary set {0, 1} or from a limited) subset of the set of natural numbers.

The expansion to real-valued HICLAS models may be valuable for two reasons. First, in some cases it may be desirable to allow for positive reals in M, even though D is not real-valued. For example, modeling a binary data set with reconstructed data values in [0, 1] naturally allows the reconstructed data to be interpreted as conditional probabilities of observing a 1. Second, in several domains of psychological research, the observed data are not of a categorical type, but rather are values on a continuous scale. Examples include measures of response times, intensity of brain waves, muscle tensions etc. In order to deal with this type of data properly, it is desirable to approximate them by reconstructed data that take values from the same set.

In this project, we will expand the family of hierarchical classes models to allow for real-valued reconstructed data. More in particular, we will work on the formulation and the mathematical study of new, real-valued HICLAS models for two- and three-way data, and we will construct and evaluate appropriate algorithms for the associated model estimation.

Date of defence: 22 February 2008

Title of thesis: Real-valued clustering methods for N-way N-mode data