News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

User menu

You are here

Generalized Data Thinning Using Sufficient Statistics

Thursday, May 9, 2024 - 11:00 to 12:00
Jacob Bien, Professor of Data Sciences and Operations, Marshall School of Business, University of Southern California
Statistics Seminar
ESB 5104 / Zoom

To join this seminar virtually: Please request Zoom connection details from ea [at] stat.ubc.ca.

Abstract: Sample splitting is one of the most tried-and-true tools in the data scientist toolbox. It breaks a data set into two independent parts, allowing one to perform valid inference after an exploratory analysis or after training a model. A recent paper (Neufeld, et al. 2023) provided a remarkable alternative to sample splitting, which the authors showed to be attractive in situations where sample splitting is not possible. Their method, called convolution-closed data thinning, proceeds very differently from sample splitting, and yet it also produces two statistically independent data sets from the original. In this talk, we will show that sufficiency is the key underlying principle that makes their approach possible. This insight leads naturally to a new framework, which we call generalized data thinning. This generalization unifies both sample splitting and convolution-closed data thinning as different applications of the same procedure. Furthermore, we show that this generalization greatly widens the scope of distributions where thinning is possible.