News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

User menu

You are here

Valid Inference After Hierarchical Clustering

Thursday, February 3, 2022 - 11:00 to 12:00
Lucy Gao, Assistant Professor, Department of Statistics and Actuarial Science, University of Waterloo
Statistics Seminar

To join via Zoom: To join this seminar, please request Zoom connection details from headsec [at]

Title: Valid Inference After Hierarchical Clustering 

Abstract: Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the type I error rate when the groups are defined a priori. However, if the groups are instead defined using a clustering algorithm, then applying a classical test yields an extremely inflated type I error rate. Surprisingly, this problem persists even if two separate and independent data sets are used for clustering and for hypothesis testing.

In this talk, I will propose a test for a difference in means between two estimated clusters that accounts for the fact that the null hypothesis is a function of the data, using a selective inference framework. Then, I will describe how to efficiently compute exact p-values for clusters obtained using hierarchical clustering. I will also show an application in the context of single-cell RNA-sequencing data, where it is common for researchers to cluster the cells, then test for a difference in mean gene expression between the clusters.

This talk is based on joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).