Faculty of Social and Behavioural Sciences
University of Amsterdam
SPANC: Simultaneous Principal and Network Components model for integration of multi-source data
From a data analytic perspective, multi-source data contains several obstacles. A general problem with a high number of variables is how to extract relevant information that is hidden within a bulk of irrelevant variables. Assuming that only a small subset of variables is actually of interest, sparse modelling shrinks the contribution of many variables (some to zero) to minimise the residual sum of squares. However, given its multi-source nature it is possible that groups of variables inherently have different characteristics, for example in their signal-to-noise ratio. Ignoring this information leads to more shrinkage of variables from data sources with particular characteristics even though they may be important in the substantive interpretation of the true model structure. Additionally substantive interpretation requires a data analytic model that is based on a data-generating theory. No statistical methods are currently available that can handle all these issues simultaneously.
We propose that the combined efforts of component and network analysis can deal with these obstacles, for the weaknesses of one are the strengths of the other and vice versa. By
incorporating the best of both analytic models, we add a new tool to the statistical toolbox of big data researchers.
Dr Katrijn van Deun, Dr Lourens Waldorp, Prof. Jeroen Vermunt & Prof. Denny Bosboom
Aspasia (Van Deun) & ERC (Borsboom)
February 1st 2016 – January 31st 2020