News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

You are here

Bayesian analysis of continuous time Markov chains with applications to phylogenetics

Tuesday, October 30, 2018 - 11:00 to 12:00
Tingting Zhao, UBC Statistics PhD Student
Statistics Seminar
Room 4192, Earth Sciences Building (2207 Main Mall)

Bayesian analysis of continuous time, discrete state space time series is an important and challenging problem, where incomplete observation and large parameter sets call for user-defined priors based on known properties of the process. Generalized Linear Models (GLM) have a largely unexplored potential to construct such prior distributions. In this talk, we show that an important challenge with Bayesian generalized linear modelling of Continuous Time Markov Chains (CTMCs) is that classical Markov Chain Monte Carlo (MCMC) techniques are too ineffective to be practical in that setup. We propose two computational methods to address this issue. The first algorithm uses an auxiliary variable construction combined with an Adaptive Hamiltonian Monte Carlo (AHMC) algorithm. To further exploit the sparsity structure often found in the parameterization of high-dimensional rate matrices, we make use of recently developed Monte Carlo schemes based on non-reversible Piecewise-deterministic Markov Processes (PDMPs). Our second algorithm combines Hamiltonian Monte Carlo (HMC) and Local Bouncy Particle Sampler (LBPS) to take advantage of the sparsity in certain high-dimensional rate matrices. We propose a characterization for a class of sparse factor graphs, where LBPS can be efficient. We also provide a framework to assess the computational complexity for the algorithm. An important aspect of practical implementation of Bouncy Particle Sampler (BPS) is the simulation of event times. Default implementations use conservative thinning bounds. Such bounds can slow down the algorithm and limit the computational performance. In the second algorithm, we develop exact analytical solutions to the random event times in the context of CTMCs. Both sampling algorithms and our model make it efficient both in terms of computation and analyst's time to construct stochastic processes informed by prior knowledge, such as known properties of the states of the process. We demonstrate the flexibility and scalability of our framework using both synthetic and real phylogenetic protein data.