Soogeun Park

Social and Behavioral Sciencespark-photo
Methodology and Statistics
Tilburg University

Personal webpage Soogeun Park

Project

Big Data in the Social Sciences: Statistical methods for multi-source high-dimensional data
Social science research has entered the era of big data: Many detailed measurements are
taken and multiple sources of information are used to unravel complex multivariate relations.
For example, in studying obesity as the outcome of environmental and genetic influences,
researchers increasingly collect survey, dietary, biomarker and genetic data from the same
individuals. Such novel integrated research can inform us on health strategies to prevent
obesity.

Although linked more-variables-than-samples (called high-dimensional) multi-source data
form an extremely rich resource for research, extracting meaningful and integrated
information is challenging and not appropriately addressed by current statistical methods. A
first problem is that relevant information is hidden in a bulk of irrelevant variables with a
high risk of finding incidental associations. Second, the sources are often very heterogeneous,
which may obscure apparent links between the shared mechanisms. Hence, a statistical
framework is needed to select the relevant groups of variables within each source and link
them throughout data sources.

Principal component methods are particularly powerful for high-dimensional data. In this
project, I will contribute to the development of a new framework by extending principal
component analysis to common components defined by relevant clusters of variables. We use
it both for exploration and outcome modelling of linked high-dimensional social sciences and
epigenetic data. The results of this project will be relevant for any researcher confronted with
linked high-dimensional data. The advanced component analysis method will be a widely
applicable and novel method for knowledge extraction also allowing for more accurate
predictions in many social science contexts with big data. In addition, the proposed empirical
study will generate important insights on the gene-environment interaction in socially
relevant outcomes like obesity.

Supervisors
Prof. dr. J.K. Vermunt, prof. dr. E. Ceulemans, dr. K. van Deun

Financed by
NWO Vidi Grant K. van Deun 2015

Period
1 September 2017 – 31 August 2021