Jiahua Chen Suggested Stat548 Papers for 2021.


My main research activities lately are on statistical problems under finite mixture models and density ratio models. I develop EM-tests for the order of finite mixture models; apply empirical likelihood to inference problems under density ratio models. I can name many research problems in both directions but they may not be problems suitable to new researchers. Viewing from different angles, you may discover important and meaningful new research problems for yourself.
To work with me on a Stat548 project, aim to demonstrate your skill in one particular technical issue plus a general understanding of the big picture. After which, search or create a meaningful data set(s) to critically examine the original or implied technical conclusions. Another possibility is to investigate the applicability of the statistical methodologies in these papers. Aim to fully understand these methodologies. Based on this, search for theoretical or applied problems that these methodologies should be suitable. Be pleasantly surprised if these methods lead to useful solutions. Your duty is not to report something publishable, but to give concrete reasoning on why they are deemed suitable and why and when they work (or do not work). In all cases, provide comprehensive justifications and avoid unsupported claims. These are skills you need in your upcoming adventures.
Plan to get the report done within 1.5 months. Tell me what you hope to cover in the report. We should jointly assess if your plan is feasible, meaningful, and time-appropriate.


You may obtain a general picture of my research activivies in the following google scholar site:
Publications and citations



Specific recommandations



Maximum empirical likelihood estimation for abundance in a closed population from capture-recapture data
Yukun Liu, Pengfei Li and Jing Qin. (2017). Biometrika, Vol 104, 527-543.
Capture-recapture experiments are widely used to collect data needed for estimating the abundance of a closed population. To account for heterogeneity in the capture probabilities, Huggins (1989) and Alho (1990) proposed a semiparametric model in which the capture probabilities are modelled parametrically and the distribution of individual characteristics is left unspecified. A conditional likelihood method was then proposed to obtain point estimates and Wald-type confidence intervals for the abundance. Empirical studies show that the small-sample distribution of the maximum conditional likelihood estimator is strongly skewed to the right, which may produce Wald-type confidence intervals with lower limits that are less than the number of captured individuals or even are negative. In this paper, we propose a full empirical likelihood approach based on Huggins and Alho's model. We show that the null distribution of the empirical likelihood ratio for the abundance is asymptotically chi-squared with one degree of freedom, and that the maximum empirical likelihood estimator achieves semiparametric efficiency. Simulation studies show that the empirical likelihood-based method is superior to the conditional likelihood-based method: its confidence interval has much better coverage, and the maximum empirical likelihood estimator has a smaller mean square error. We analyse three datasets to illustrate the advantages of our empirical likelihood approach.
A student should go over all basic statistical concepts, models, tools, and mathematical derivations in this paper. For Stat 548, one may skim the asymptotic derivations and skip the semiparametric efficiency. A student should form a sense of discretion on the level of technicality in the Stat548 report. Use simulation to verify various claims of this paper. For instance, does the Wald-type CI has negative lower limits often or only in extreme cases?

Penalized Maximum Likelihood Estimator for Mixture of von Mises-Fisher Distributions

The paper can be obtained here /a>
The von Mises-Fisher distribution is one of the most widely used probability distributions to describe directional data. Finite mixtures of von Mises-Fisher distributions have found numerous applications. However, the likelihood function for the finite mixture of von Mises-Fisher distributions is unbounded and consequently the maximum likelihood estimation is not well defined. To address the problem of likelihood degeneracy, we consider a penalized maximum likelihood approach whereby a penalty function is incorporated. We prove strong consistency of the resulting estimator. An Expectation-Maximization algorithm for the penalized likelihood function is developed and simulation studies are performed to examine its performance.
This is a typical technical paper that answers a technical problem following the same routine from the literature. The paper contains all the necessary ingredients of a research project. I do not judge it as very novel or exemplary for a new researcher. Yet studying this paper is a good exercise to learn the routines of statistical research. A student who chooses this paper for Stat 548 should go over all steps in the paper, and formulates a comprehensive summary.


Empirical likelihood confidence intervals for complex sampling designs.

Berger and De La Riva Torres (2016, JRSS-B). Vol 78, 319-341.
We define an empirical likelihood approach which gives consistent design-based confidence intervals which can be calculated without the need of variance estimates, designeffects, resampling, joint inclusion probabilities and linearization, even when the point estimator is not linear. It can be used to construct confidence intervals for a large class of sampling designs and estimators which are solutions of estimating equations. It can be used for means, regressions coefficients, quantiles, totals or counts even when the population size is unknown. It can be used with large sampling fractions and naturally includes calibration constraints. It can be viewed as an extension of the empirical likelihood approach to complex survey data. This approach is computationally simpler than the pseudoempirical likelihood and the bootstrap approaches. The simulation study shows that the confidence interval proposed may give better coverages than the confidence intervals based on linearization, bootstrap and pseudoempirical likelihood. Our simulation study shows that, under complex sampling designs, standard confidence intervals based on normality may have poor coverages, because point estimators may not follow a normal sampling distribution and their variance estimators may be biased.
There have been many versions of empirical likelihood in the context of the sampling survey. The authors of this paper advocate one of their own that has many good properties. A stat548 report should describe several sampling designs, several approaches, the motivations of these approaches discussed this paper. Use simulation to critically examine some of their claims. Be selective on the issues to be included so that you will not run out of time and hit unnecessarily technical barriers.