Earth Sciences Building, 2207 Main Mall, Room 4192
Mon 9th December 2013
Computational methods for mixed effects models in large-scale genetic and genomic studies
This will be a two-part talk describing efficient computational methods for mixed effects models in large-scale genome-wide association studies (GWASs) and RNA sequencing (RNAseq) studies. In the first part of the talk I will focus on linear mixed models. Linear mixed models have attracted considerable attention recently as powerful and effective tools to account for population stratification and relatedness in GWASs. However, existing methods for calculating likelihood ratio test (LRT) statistics are computationally impractical for even moderate-sized GWASs, and many studies have to rely on approximate LRT methods. To address this issue, I present novel computationally-efficient algorithms, which we refer to as genome-wide efficient mixed model association (GEMMA), for fitting both univariate and multivariate linear mixed models, and computing the LRT for SNP associations in GWASs. Our methods improve on existing approximate LRT methods in computation speed, power/correct control of type I error, and ability to deal with more than two phenotypes. I illustrate these features on real and simulated data. In the second part of the talk I will focus on Poisson mixed models. High throughput sequencing is extremely widely used in genetics and genomics, and provides unprecedented insights into many basic biological questions. However, analyzing sequencing data in related individuals presents major statistical and computational challenges, as the read count data generated from a sequencing experiment not only inherit a Poisson noise from the sequencing machine, but also display additional variation across individuals (i.e. over-dispersion). To model this over-dispersion, I propose a Poisson mixed effects model with one random effects term to account for individual relatedness and an error term to account for independent noise. I present a novel and efficient posterior sampling algorithm for the model. With real RNAseq data, I show that our model is more powerful than the widely used negative binomial based approaches in identifying differentially expressed genes.