Balázs, Katalin

Detecting heterogeneity in logistic regression models


Katalin Balázs PhD

Project: K.U.Leuven, Belgium

Supervisor: Prof. dr Paul De Boeck

Project running from: 1 November 2002 – 1 November 2006

The aim of the research project is to reveal and systematically examine possible methods for indicating heterogeneity in binary item by person data with item covariates. Several kinds of heterogeneity sources can be differentiated (being based on the person, on the item or on both of them). The heterogeneity indicators can be either parametric or nonparametric. The research plan includes comparisons of methods for detection of heterogeneity, in order to find out what their strengths and weaknesses are.
In the first study, several methods were tried out for detecting the dimensional type, person based heterogeneity in a logistic regression model with item covariates.

  • The individual analyses and mean deviance method turned out to have specific problems.
  • The PCA and Alternating Logistic Regression (ALR), a GEE approach, were successful in estimating the heterogeneity of the data. The PCA approach will not be followed, because it often leads to artefacts. Concerning the ALR, we will try to overcome the limitation that it requires the item features to be known, by including an additive clustering algorithm as a preliminary step of ALR.
  • DIMTEST and DETECT, which are popular nonparametric methods for revealing the dimensionality of data, performed well in this study when we used an ad hoc decision rule for the DETECT procedure different from those defined in the manual. The optimal decision rule will be further investigated through simulation studies, in order to reveal the factors effecting its value.

Finally, we will concentrate on a non-dimensional type of heterogeneity, where observations create heterogeneity in other observations. LID detecting procedures will be compared with the methods have been just mentioned.

Date of defence: 24 May 2007

Title of thesis: Detecting heterogeneity in logistic regression models