Valid Inference After Hierarchical Clustering

To join via Zoom: To join this seminar, please request Zoom connection details from headsec@stat.ubc.ca

Title: Valid Inference After Hierarchical Clustering

Abstract: Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the type I error rate when the groups are defined a priori. However, if the groups are instead defined using a clustering algorithm, then applying a classical test yields an extremely inflated type I error rate. Surprisingly, this problem persists even if two separate and independent data sets are used for clustering and for hypothesis testing.

In this talk, I will propose a test for a difference in means between two estimated clusters that accounts for the fact that the null hypothesis is a function of the data, using a selective inference framework. Then, I will describe how to efficiently compute exact p-values for clusters obtained using hierarchical clustering. I will also show an application in the context of single-cell RNA-sequencing data, where it is common for researchers to cluster the cells, then test for a difference in mean gene expression between the clusters.

This talk is based on joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).

Event type: Statistics Seminar
Speaker's page: Location: Zoom
Event date: Thu, 02/03/2022 - 11:00 - Thu, 02/03/2022 - 12:00
Speaker: Lucy Gao, Assistant Professor, Department of Statistics and Actuarial Science, University of Waterloo