To join via Zoom: Please register here.
Time: 11:00am – 11:30am
Speaker: Johnny Xi, UBC Statistics MSc student
Title: Indeterminacy in Latent Variable Models: Characterization and Strong Identifiability
Abstract: The history of latent variable models spans nearly 100 years, from factor analysis to modern unsupervised machine learning. An enduring goal is to interpret the latent variables as "true" factors of variation, unique to a sample. Unfortunately, modern non-linear methods are wildly underdetermined, leading to many possible, equally valid solutions, even in the limit of infinite data. I will describe a theoretical framework that rigourously formulates the uniqueness problem as statistical identifiability, unifying existing progress towards this goal. The framework explicitly characterizes the sources of non-identifiability, making it possible to design strongly identifiable latent variable models in a transparent way. Using insights derived from the framework, our work proposes two flexible non-linear models with unique latent variables.
Time: 11:30am – 12:00pm
Speaker: Naitong Chen, UBC Statistics MSc student
Title: Bayesian Inference via Sparse Hamiltonian Flows
Abstract: A Bayesian coreset is a small, weighted subset of data that replaces the full dataset during Bayesian inference, with the goal of reducing computational cost. Although past work has shown empirically that there often exists a coreset with low inferential error, efficiently constructing such a coreset remains a challenge. Current methods tend to be slow, require a secondary inference step after coreset construction, and do not provide bounds on the data marginal evidence. In this work, we introduce a new method—sparse Hamiltonian flows—that addresses all three of these challenges. The method involves first subsampling the data uniformly, and then optimizing a Hamiltonian flow parametrized by coreset weights and including periodic momentum quasi-refreshment steps. Theoretical results show that the method enables an exponential compression of the dataset in a representative model, and that the quasi-refreshment steps reduce the KL divergence to the target. Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide accurate posterior approximations with significantly reduced runtime compared with competing dynamical-system-based inference methods