We consider the problem of exploiting the gene-environment independence assumption in a case-control study inferring the joint effect of genotype and environmental exposure on disease risk.
We first take a detour and develop the constrained maximum likelihood estimation theory for parameters arising from a partially identified model, where some parameters of the model may only be identified through constraints imposed by additional assumptions. We show that, under certain conditions, the constrained maximum likelihood estimator exists and locally maximizes the likelihood func- tion subject to constraints. Moreover, we study the asymptotic distribution of the estimator and propose a numerical algorithm for estimating parameters.
Next, we use the frequentist approach to analyze case-control data under the gene-environment independence assumption. By transforming the problem into a constrained maximum likelihood estimation problem, we are able to derive the asymptotic distribution of the estimator in a closed form. We then show that exploiting the gene-environment independence assumption indeed improves estimation efficiency. Also, we propose an easy-to-implement numerical algorithm for finding estimates in practice.
Furthermore, we approach the problem in a Bayesian framework. By introducing a different parameterization of the underlying model for case-control data, we are able to define a prior structure reflecting the gene-environment independence assumption and develop an efficient numerical algorithm for the computation of the posterior distribution. The proposed Bayesian method is further generalized to address the concern about the validity of the gene-environment independence assumption.
Finally, we consider a special variant of the standard case-control design, the case-only design, and study the analysis of case-only data under the gene-environment independence assumption and the rare disease assumption. We show that the Bayesian method for analyzing case-control data is readily applicable for the analysis of case-only data, allowing the flexibility of incorporating different prior beliefs on disease prevalence.