News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA

Enter the characters shown in the image.

User menu

You are here

Valid Inference After Hierarchical Clustering

Thursday, February 3, 2022 - 11:00 to 12:00
Lucy Gao, Assistant Professor, Department of Statistics and Actuarial Science, University of Waterloo
Statistics Seminar
Zoom

To join via Zoom: To join this seminar, please request Zoom connection details from headsec [at] stat.ubc.ca

Title: Valid Inference After Hierarchical Clustering 

Abstract: Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Classical tests control the type I error rate when the groups are defined a priori. However, if the groups are instead defined using a clustering algorithm, then applying a classical test yields an extremely inflated type I error rate. Surprisingly, this problem persists even if two separate and independent data sets are used for clustering and for hypothesis testing.

In this talk, I will propose a test for a difference in means between two estimated clusters that accounts for the fact that the null hypothesis is a function of the data, using a selective inference framework. Then, I will describe how to efficiently compute exact p-values for clusters obtained using hierarchical clustering. I will also show an application in the context of single-cell RNA-sequencing data, where it is common for researchers to cluster the cells, then test for a difference in mean gene expression between the clusters.

This talk is based on joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).