Prof. D.I. Boomsma, Dr J. Vink & Prof. C.V. Dolan
On February 18th 2016, Camelia Minica will defend her thesis entitled
Family-Based Genetic Association Analysis: Methods and Applications to Addiction Phenotypes
Multivariate data may confer power advantages in GWAS, yet multivariate data require modeling choices. Chapter II compared the efficiency (in terms of power) of several analytic strategies to detect a genetic variant in multivariate phenotypic data. Twin data were simulated to fit exactly the following five models: 1) single common genetic factor, 2) a correlated genetic common factors model, 3) a latent regression model, 4) a hybrid simplex (AE) – factor (C) model, and 5) a stationary double simplex (AE) model. The effect of the genetic variant on all or a subset of the phenotypes was mediated by the common genetic factor(s). In twin 1 data the following analytic strategies were considered: a) univariate tests in which each phenotype was regressed on the genetic variant (single phenotype ANOVA); b) univariate tests based on sum scores (ANOVA); c) exploratory factor analysis (EFA); c) multivariate tests based on MANOVA. Power calculations were based on the non-centrality parameter (NCP). Results demonstrated that: a) the sum scores ANOVA and the exploratory factor analysis were the most powerful strategies when the genetic effect was general, i.e., propagated in all phenotypic indicators, while MANOVA was the least powerful in this circumstance; b) MANOVA and EFA were particularly powerful when the genetic variant was propagated in a subset of phenotypes, and their power increased with increasing phenotypic correlations; c) the NCPs of MANOVA and EFA were equal across all scenarios indicating that the differences in power between the two strategies arisen from the differences in degrees of freedom.
Family-based genotype imputation was proposed as a means of increasing power in GWAS, as it allows for the inclusion into association analysis of individuals with observed phenotypes but missing genotypes. Chapter III considered factors affecting the power to detect genetic association following family-based genotype imputation. The study focused on sibships of sizes 2 to 4, where imputation was informed by 1 sibling, or by 1 sibling and 1 parent. Monte Carlo simulations were used to compare the power of the mixture approach (involving the full distribution of the imputed genotypes) with the power of the dosage approach (where the mean of the conditional distribution featured as the imputed genotype). Furthermore, the effect on power and type I error rates of misspecification of the familial covariance matrix was considered given low, moderate and highly heritable traits. Misspecification pertained to the use of an exchangeable model which accounts for the sibling correlations by means of a single correlation (a model of interest also for computational reasons). Finally, the simulation results were verified in two empirical datasets. Results showed that: a) the power differences among the dosage and the mixture approaches are quite small and recommend the use of the dosage approach because it is computationally easier; b) correct model specification is desirable particularly when the trait is highly heritable in order to yield correct type I error rates; c) lastly, it was showed that family-based imputation yields considerable power gains only in specific circumstances.
Full, correct modeling of the conditional familial covariance matrix confers power advantages and yields correct type I error rates. Yet correct modeling can be complicated and subject to misspecification when families are variable in size and composition. Model misspecification – as discussed in chapter III – is also of interest for computational reasons. Chapter IV focused on the effect on power of misspecification of the familial covariance matrix and considered several sandwich corrections of the standard errors to ensure correct type I error rates in family based GWASs. Specifically, the performance of the unweighted least squares (ULS) and of the maximum likelihood estimators (ML) was compared given: a) AE and ACE traits simulated in families comprising 4 siblings (2 MZ/DZ twins and 2 siblings), with and without parents, and b) various background correlations. Results demonstrated that the extreme misspecification employed by the sandwich corrected ULS procedure implemented in Plink leads to a dramatic loss in power given moderate to large background correlations. Furthermore, it was shown that the fast ML procedure is equally amenable to a sandwich correction. To analyze A(C)E traits in samples consisting of families varying in size and composition (when full, correct modeling is complicated and subject to misspecification), a misspecified CE/AE linear mixed model in combination with a sandwich correction is likely to
On modeling genetic association with addiction phenotypes
My PhD project aims to identify genes and gene networks associated with individual differences in the liability to substance use and abuse. A second focus of my project is to investigate whether the genetic factors involved in addiction have substance specific effects. Thirdly, I will study and implement in my analyses alternative methods of increasing the power of genome-wide association studies. To fulfill these aims I will make use of the vast wealth of the phenotypic and genotypic data of the Netherlands Twin Register .
To reliably identify susceptibility loci involved in experimental and regular substance use I will use and develop state of the art methodology like genome wide association (GWA) analyses and candidate gene approaches where the relationship between measured genetic markers and the measured complex phenotypes will be studied by using developmentally realistic latent class modeling, including mixtures of growth curve modeling (with regime switching), and Markov modeling, survival models, pathway-analysis.
As the phenotypes of interest are complex ones and require relatively large samples for detection, I will investigate alternative ways of increasing power to detect genetic association. For instance, I will inquire the power advantages conferred by the inclusion into association analysis of family-based imputed genotypes. We will also combine our results with those of other research groups worldwide to increase power and replicate our findings in, for example, meta-analyses.