Markov chain Monte Carlo (MCMC) is notoriously difficult to scale to problems having high-dimensional latent variables ("big models"), which arise in many scientific and engineering applications.
We are developing a new perspective on parallel tempering (PT) algorithms and their tuning, based on two main insights. First, we identified and formalized a sharp divide in the behaviour and performance of reversible versus non-reversible PT methods. Second, we are analyzing the behaviour of PT algorithms using a novel asymptotic regime in which the number of parallel compute cores goes to infinity. Preprint
We are working on an alternative to MCMC that we call the "Bouncy Particle Sampler" (BPS), which imports ideas from the field of molecular simulation to scale MCMC to high dimensional problems. JASA paper
Follow-up Annals of Statistics paper, on the geometric ergodicity of BPS
Preprint of follow-up work, on non-linear trajectories and discrete piecewise deterministic Markov processes
We are developing a language and software development kit for doing Bayesian analysis. The design philosophy is centered around the day-to-day requirements of real world (Bayesian) data science. The inference engines brings to bear several recent advances such as non-reversible methods.
As a result of advances in sequencing technologies, the fields of computational and statistical phylogenetics, which are concerned with the modelling and inference of evolutionary relationships, have been growing rapidly in recent years. I am particularly interested in computationally-intensive Bayesian methods and inference of complex evolutionary models.
Proliferating cancer cells, in which DNA repair mechanisms are disrupted, accumulate mutations at a much faster rate than healthy cells do. This leads to the emergence of an evolutionary process inside the tumour. A current research frontier is the characterization of the evolutionary dynamics and phylogenies within individual cancer tumours, where multiple sub-populations of cancer cells acquire differentiating sets of mutations.
Phylogenetic trees (or networks, forests, etc) also play an important in linguistics, to describe how language change and splits in ancestral speaker populations gave rise to today's linguistic diversity. Computational methods are also starting to play an important role in this field.
My main field of research is in computational statistics/statistical machine learning. I am interested in the mathematical side of the subject as well as in applications in linguistics and biology.
On the methodology side, I am interested in Monte Carlo methods such as SMC and MCMC, graphical models, non-parametric Bayesian statistics, randomized algorithms, and variational inference.
My favorite applications, both in linguistics and biology, are related to phylogenetics in one way or another. Some examples of things I have currently/recently been working on: automated reconstruction of proto-languages; cancer phylogenetics; population genetics; pedigrees, tree and alignment inference.
In the past, I also did some work on machine translation, on logical characterization and approximation of labeled Markov processes, and on reinforcement learning.