University of British Columbia
bouchard@stat.ubc.ca
The prevalence of uncertainty in our world has fuelled the development of sophisticated mathematical methods to understand and tame uncertainty. This has been a central quest in the field of statistics. A key concept often used to depict uncertainty is the notion of a probability distribution, which can be thought of as measuring, for each possible state of the system, a degree of belief. Being able to interrogate probability distributions is therefore of paramount importance in statistics, and hence in the many fields of science and engineering that depend on statistics and uncertainty quantification. As scientific models become increasingly complex, the calculations required to query probability distributions are getting computationally prohibitive, to the point that these computations are the bottleneck in many disciplines. My field of research is concerned with computational methods that break these bottlenecks, by making use of algorithms exploiting randomness.
JRSSB paper on a new perspective to Parallel Tempering (PT)
ICML paper on PT with generalized paths of distributions
Proliferating cancer cells, in which DNA repair mechanisms are disrupted, accumulate mutations at a much faster rate than healthy cells do. This leads to the emergence of an evolutionary process inside the tumour. A current research frontier is the characterization of the evolutionary dynamics and phylogenies within individual cancer tumours, where multiple sub-populations of cancer cells acquire differentiating sets of mutations.
Phylogenetic inference from single cell copy number alterations
Nature Methods paper on the analysis of single cell data
Nature Methods paper on PyClone, a Bayesian non-parametric deconvolution method for bulk cancer data
We are developing a language and software development kit for doing Bayesian analysis. The design philosophy is centered around the day-to-day requirements of real world (Bayesian) data science. The inference engines brings to bear several recent advances such as non-reversible methods.
Easy to use distributed Bayesian inference
A modelling language for Bayesian inference over combinatorial spaces
Markov chain Monte Carlo (MCMC) is notoriously difficult to scale to problems having high-dimensional latent variables ("big models"), which arise in many scientific and engineering applications.
We are working on an alternative to MCMC that we call the "Bouncy Particle Sampler" (BPS), which imports ideas from the field of molecular simulation to scale MCMC to high dimensional problems. JASA paper
Follow-up Annals of Statistics paper, on the geometric ergodicity of BPS
Preprint of follow-up work, on non-linear trajectories and discrete piecewise deterministic Markov processes
As a result of advances in sequencing technologies, the fields of computational and statistical phylogenetics, which are concerned with the modelling and inference of evolutionary relationships, have been growing rapidly in recent years. I am particularly interested in computationally-intensive Bayesian methods and inference of complex evolutionary models.
Sys Bio paper on change-of-measure based phylogenetic SMC algorithm.
Novel sampling method based on Hamiltonian Monte Carlo for parameter-rich evolutionary models.
Long indel model (in Sys Bio) based on the Poisson Indel Process (PNAS).
Phylogenetic trees (or networks, forests, etc) also play an important in linguistics, to describe how language change and splits in ancestral speaker populations gave rise to today's linguistic diversity. Computational methods are also starting to play an important role in this field.
My main field of research is in computational statistics/statistical machine learning. I am interested in the mathematical side of the subject as well as in applications in linguistics and biology.
On the methodology side, I am interested in Monte Carlo methods such as SMC and MCMC, graphical models, non-parametric Bayesian statistics, randomized algorithms, and variational inference.
My favorite applications, both in linguistics and biology, are related to phylogenetics in one way or another. Some examples of things I have currently/recently been working on: automated reconstruction of proto-languages; cancer phylogenetics; population genetics; pedigrees, tree and alignment inference.
In the past, I also did some work on machine translation, on logical characterization and approximation of labeled Markov processes, and on reinforcement learning.