Jiahua Chen Suggested Stat548 Papers for 2021.
My main research activities lately are on statistical problems
under finite mixture models and density ratio models.
I develop EM-tests for the order of finite mixture models;
apply empirical likelihood to inference problems under density ratio models.
I can name many research problems in both directions but they may not be
problems suitable to new researchers.
Viewing from different angles, you may discover important and meaningful new
research problems for yourself.
To work with me on a Stat548 project, aim to demonstrate
your skill in one particular technical issue plus
a general understanding of the big picture.
After which, search or create a meaningful data set(s) to critically
examine the original or implied technical conclusions.
Another possibility is to investigate the applicability of the
statistical methodologies in these papers.
Aim to fully understand these methodologies. Based on
this, search for theoretical or applied problems
that these methodologies should be suitable.
Be pleasantly surprised if these methods lead to useful solutions.
Your duty is not to report something publishable,
but to give concrete reasoning on why they are
deemed suitable and why and when they work (or do not work).
In all cases, provide comprehensive justifications and avoid
unsupported claims. These are skills you need in your upcoming adventures.
Plan to get the report done within 1.5 months.
Tell me what you hope to cover in the report.
We should jointly assess if your plan is feasible, meaningful,
You may obtain a general picture of my research activivies in the following
google scholar site:
Publications and citations
Maximum empirical likelihood estimation for abundance in a closed population
from capture-recapture data
Yukun Liu, Pengfei Li and Jing Qin. (2017). Biometrika, Vol 104, 527-543.
Capture-recapture experiments are widely used to collect data needed
for estimating the abundance of a closed population.
To account for heterogeneity in the capture probabilities, Huggins (1989)
and Alho (1990) proposed a semiparametric model in which the capture probabilities are modelled
parametrically and the distribution of individual characteristics is left unspecified.
A conditional likelihood method was then proposed to obtain point estimates and
Wald-type confidence intervals for the abundance.
Empirical studies show that the small-sample distribution of the maximum
conditional likelihood estimator is strongly skewed to the right,
which may produce Wald-type confidence intervals with lower limits
that are less than the number of captured individuals or even are negative.
In this paper, we propose a full empirical likelihood approach based on Huggins and Alho's model.
We show that the null distribution of the empirical likelihood ratio for
the abundance is asymptotically chi-squared with one degree of freedom,
and that the maximum empirical likelihood estimator achieves semiparametric efficiency.
Simulation studies show that the empirical likelihood-based method is
superior to the conditional likelihood-based method:
its confidence interval has much better coverage,
and the maximum empirical likelihood estimator
has a smaller mean square error.
We analyse three datasets to illustrate the advantages of our
empirical likelihood approach.
A student should go over all basic statistical concepts, models,
tools, and mathematical derivations in this paper.
For Stat 548, one may skim the asymptotic
derivations and skip the semiparametric efficiency.
A student should form a sense of discretion on the level
of technicality in the Stat548 report.
Use simulation to verify various claims of this paper.
For instance, does the Wald-type CI has negative lower limits
often or only in extreme cases?
Penalized Maximum Likelihood Estimator for Mixture of
von Mises-Fisher Distributions
The paper can be obtained here
The von Mises-Fisher distribution is one of the most widely used probability distributions to
describe directional data. Finite mixtures of von Mises-Fisher distributions have found numerous
applications. However, the likelihood function for the finite mixture of von Mises-Fisher
distributions is unbounded and consequently the maximum likelihood estimation is not well defined.
To address the problem of likelihood degeneracy, we consider a penalized maximum likelihood
approach whereby a penalty function is incorporated. We prove strong consistency of the resulting
estimator. An Expectation-Maximization algorithm for the penalized likelihood function is developed
and simulation studies are performed to examine its performance.
This is a typical technical paper that answers a technical problem following the
same routine from the literature.
The paper contains all the necessary ingredients of a research project.
I do not judge it as very novel or exemplary for a new researcher.
Yet studying this paper is a good exercise to learn the routines of
A student who chooses this paper for Stat 548 should go over all steps in the paper,
and formulates a comprehensive summary.
Empirical likelihood confidence intervals for complex sampling designs.
Berger and De La Riva Torres (2016, JRSS-B). Vol 78, 319-341.
We define an empirical likelihood approach which gives consistent design-based confidence intervals
which can be calculated without the need of variance estimates, designeffects, resampling, joint
inclusion probabilities and linearization, even when the point estimator is not linear. It can be
used to construct confidence intervals for a large class of sampling designs and estimators which
are solutions of estimating equations. It can be used for means, regressions coefficients,
quantiles, totals or counts even when the population size is unknown. It can be used with large
sampling fractions and naturally includes calibration constraints. It can be viewed as an extension
of the empirical likelihood approach to complex survey data. This approach is computationally
simpler than the pseudoempirical likelihood and the bootstrap approaches. The simulation study
shows that the confidence interval proposed may give better coverages than the confidence intervals
based on linearization, bootstrap and pseudoempirical likelihood. Our simulation study shows that,
under complex sampling designs, standard confidence intervals based on normality may have poor
coverages, because point estimators may not follow a normal sampling distribution and their
variance estimators may be biased.
There have been many versions of empirical likelihood in the context of the sampling survey.
The authors of this paper advocate one of their own that has many good properties.
A stat548 report should describe several sampling designs, several approaches, the
motivations of these approaches discussed this paper.
Use simulation to critically examine some of their claims.
Be selective on the issues to be included so that you will not run out of time
and hit unnecessarily technical barriers.