News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA

Enter the characters shown in the image.

User menu

You are here

ML-assisted statistical inference for genetic discovery

Thursday, January 30, 2025 - 10:30 to 11:30
Jiacheng Miao, Ph.D. student, Biomedical Data Science, University of Wisconsin-Madison
Statistics Seminar
ESB 4192 / Zoom

To join this seminar virtually: Please request Zoom connection details from ea [at] stat.ubc.ca

Abstract: AI/ML applications have quickly gained popularity in many scientific domains, and in some cases, even started replacing conventional approaches for data collection. However, the reliability of scientific findings purely based on ML-derived outcomes remains largely unexplored. In this talk, I will demonstrate that genetic association analysis based on ML-derived phenotypic outcomes can lead to pervasive false-positive findings. To address this, I will introduce a statistical framework named “POP-GWAS” for ML-assisted statistical inference for genetic discovery. It ensures valid and efficient inference given arbitrary "black-box" ML prediction. Moreover, the framework only requires summary statistics as input, enabling computationally efficient application at the biobank scale. Using POP-GWAS, I performed the largest genome-wide association study (GWAS) to date on bone mineral density derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, achieving a 9.7%-50.7% gain in effective sample size compared to conventional approaches. This new approach identified 89 novel genetic associations and many complex traits showing significant skeletal-site- specific genetic correlations with bone mineral density. In addition, I will discuss the extension of this framework to general statistical tasks, providing both theoretical insights on statistical optimality and practical implications of summary-statistics-based statistical inference. Finally, I will give a brief overview of my research program, covering topics from quantifying gene-environment interactions to advanced genetic risk prediction in diverse ancestries.