Jiahua Chen Suggested Stat548 Papers for 2023.

"I am active in areas of finite mixture models, empirical likelihood and missing data problem in sample survey. For opportunities of joining my research team under my supervision, students should show their talent at mastering of statistical theory and the underlying mathematical skills, having a good taste on the importance of published results, and the ability to detect their limitations and key conclusions. I am actively engaged in the fields of finite mixture models, empirical likelihood, and addressing missing data issues in sample surveys. For those interested in joining my research team under my supervision, prospective students should demonstrate a strong aptitude for mastering statistical theory and the underlying mathematical skills. A keen appreciation for the significance of published findings and the ability to identify their limitations and key conclusions are essential qualities.
A student can select one of the two papers provided below to complete a Stat 548 project under my supervision. To attain a high grade, the student must showcase expertise in a specific technical aspect of the chosen paper while also demonstrating a broad comprehension of the overall context. While replicating the core steps of the theoretical derivations, students have the flexibility to adopt their unique approach. This entails omitting routine but intricate algebraic processes, assuming and stating intermediate conclusions without proofs, all while ensuring the fundamental essence of the paper remains intact. The ability to discern what is pivotal for inclusion, rather than being explicitly directed, is an integral aspect of this exercise.
Regarding methodology, students are encouraged to construct concrete and/or hypothetical scenarios to critically evaluate the effectiveness of the proposed methods discussed in the paper. The ensuing report should elucidate the rationale behind the chosen scenarios and articulate the insights anticipated from the corresponding simulation results.
Target completing a report within 1.5 months. I encourage you to read the selected paper promptly. If you find it intriguing, kindly jot down your initial impressions. Create an outline that highlights the specific topics and the level of detail you intend to incorporate into the report. Together, we can evaluate the feasibility, significance, and appropriateness in terms of time investment.

You may obtain a general picture of my research activities in the following google scholar site:
Publications and citations


Nearest Neightbor Imputation for Survey Data. Jiahua Chen and Jun Shao (Journal of Official Statistics, V16, 2000 pp113-1310.

Nearest neighbor imputation is one of the hot deck methods used to compensate for nonresponse in sample surveys. Although it has a long history of application, few theoretical properties of the nearest neighbor imputation method are known prior to the current article. We show that under some conditions, the nearest neighbor imputation method provides asymptotically unbiased and consistent estimators of functions of population means (or totals), population distributions, and population quantiles. We also derive the asymptotic variances for estimators based on nearest neighbor imputation and consistent estimators of these asymptotic variances. Some simulation results show that the estimators based on nearest neighbor imputation and the proposed variance estimators have good performances.

Download the paper from the link provided by google scholar.

Testing homogeneity in a multivariate mixture model
Xiaoqing Niu, Pengfei Li, Peng Zhang. The Canadian Journal of Statistics, 39. 218--238.

Testing homogeneity is a fundamental problem in finite mixture models. It has been investigated by many researchers and most of the existing works have focused on the univariate case. In this article, the authors extend the use of the EM--test for testing homogeneity to multivariate mixture models. They show that the EM--test statistic asymptotically has the same distribution as a certain transformation of a single multivariate normal vector. On the basis of this result, they suggest a resampling procedure to approximate the P--value of the EM--test. Simulation studies show that the EM--test has accurate type I errors and adequate power, and is more powerful and computationally efficient than the bootstrap likelihood ratio test. Two real data sets are analysed to illustrate the application of our theoretical results.
Download the paper from the link provided by google scholar.