To join this seminar virtually: https://ubc.zoom.us/j/68285564037?pwd=R2ZpLy9uc2pUYldHT3laK3orakg0dz09
Meeting ID: 682 8556 4037 Passcode: 636252
Abstract: Since shortly after the popularization of stochastic gradient optimization methods in machine learning---which now scale model training to billions of examples and beyond---researchers have been trying to use the same basic data subsampling techniques to speed up computational Bayesian inference algorithms. In this talk, I'll cover the broad classes of methods that have been developed, highlights of progress in the field, and the current state of the art. Along the way I'll introduce some recent work from my group on scalable Bayesian inference via coresets, i.e., sparse dataset summaries. I'll show that coresets offer an exponential compression of the data (and so an exponential speed-up of methods like Markov chain Monte Carlo) in a wide variety of models, and can be constructed in an automated manner, with theoretical convergence guarantees, and without requiring special knowledge of model structure beyond conditional independence of data. While other methods implicitly rely on asymptotic Gaussianity, coresets are particularly amenable to posteriors that don't exhibit this usual asymptotic behaviour, with discrete variables, weak or unidentifiability, low-dimensional manifold structure, etc. I'll conclude with empirical results and a discussion of next steps for Bayesian inference in the big data regime.