PhD (Probability Theory and Mathematical Statistics, Northeast Normal University, China); BSc (Mathematics and Application Mathematics, Northeast Normal University, China) My research interests are statistics and biostatistics, particularly in the statistical methods base on the mixture model. For biology and genome data, such as human complex diseases in genome wide association studies (GWAS), I have focused on some statistical methods to find the cause of diseases. Based on the genetic data, I have proposed a multi-sample mixture model to identifying genetic imprinting. For the finite mixture model, I am also interested in the large sample theory, which includes the consistency of MLEs and the limiting distribution of LRT statistics. In addition, I have focused on mixture model with auxiliary information.
Likelihood Ratio Test for Multisample Mixture Model and Its Application to Genetic Imprinting
Show Abstract
Genomic imprinting is a known aspect of the etiology of many diseases. The imprinting phenomenon depicts differential expression levels of the allele depending on its parental origin. When the parental origin is unknown, the expression level has a finite normal mixture distribution. In such applications, a random sample of expression levels consists of three subsamples according to the number of minor alleles an individual possesses, of which one is the mixture and the other two are homogeneous. This understanding leads to a likelihood ratio test (LRT) for the presence of imprinting. Because of the nonregularity of the finite mixture model, the classical asymptotic conclusions on likelihood-based inference are not applicable. We show that the maximum likelihood estimator of the mixing distribution remains consistent. More interestingly, thanks to the homogeneous subsamples, the LRT statistic has an elegant and rather distinct 0.5X1^2 + 0.5X2^2 null limiting distribution. Simulation studies confirm that the limiting distribution provides precise approximations of the finite sample distributions under various parameter settings. The LRT is applied to expression data. Our analyses provide evidence for imprinting for a number of isoform expressions.