Phylogenetic inference aims to reconstruct the evolutionary history of populations or species. With the rapid expansion of genetic data available, statistical methods play an increasingly important role in phylogenetic inference. In this talk, we present new evolutionary models, statistical inference methods and efficient algorithms for reconstructing phylogenetic trees at the level of populations using single nucleotide polymorphism data and at the level of species using multiple sequence alignment data.
At the level of populations, we introduce a new inference method to estimate evolutionary distances for any two populations to their most recent common ancestral population using single-nucleotide polymorphism allele frequencies. Our method is based on a new evolutionary model for both drift and fixation. To scale this method to large numbers of populations, we introduce the asymmetric neighbor-joining algorithm, an efficient method for reconstructing rooted bifurcating trees.
At the level of species, we introduce a continuous time stochastic process, the geometric Poisson indel process, that allows indel rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor-joining.