Lecture 8: SMC samplers and PMCMC

17 Mar 2015

Instructor: Alexandre Bouchard-Côté
Editor: Creagh Briercliffe

Sequential Monte Carlo (SMC)

Motivation: to fix or at least alleviate the issue raised at the end of the last section. Namely, in Sequential Importance Sampling (SIS) all of the normalized weights converge to 0, except for 1 particle which takes all of the normalized weight.

Intuition: prune particles of low weights between each SIS iteration (or when the population becomes too unbalanced).

Constraint: we would like to do our pruning in such a way that the consistency of Importance Sampling (IS) is preserved. That is, we would like to be able to (a.s.) approach both $Z$ (the normalizing constant) and $\int f(x) \pi(x) \ud x$ as the number of particles goes to infinity.

Solution: Resampling will be a way to prune particles while preserving consistency. The simplest scheme is called multinomial resampling. We explain it via an example.

Multinomial resampling:

Suppose we are given an urn containing balls of different colors.
This urn can be viewed as a probability distribution over colors.
To approximate this probability distribution, I draw 100 times with replacement and assign to each color the fraction of times I drew this color.
More abstractly, this process can take any probability measure as an input, and create an approximation distribution which can be viewed as 100 equally weighted particles.

In SMC, the balls are the particle population from the previous generation, and instead of 100 draws we pick the number of particles as the number of resampling draws.

Theoretical challenge: this creates interaction/dependencies across particles, making it harder to prove consistency than in the IS or SIS setups.

Other pictures for multinomial resampling: throwing darts on a colored stick. Based on where the dart lands, assign to each color the fraction of times the darts land on this color.

Lower variance resampling alternatives:

Stratified resampling: split the colored stick into bins of equal size and throw one dart uniformly in each bin.
Systematic resampling: same as stratified resampling, but we reuse the 100 uniform random numbers — thereby deterministiaclly linking all of the samples in each bin.
For more, see: http://biblio.telecom-paristech.fr/cgi-bin/download.cgi?id=5755.

Note: only use resampling when it is really needed, i.e. when there are indeed many low weight particles in the current iteration. A useful strategy is to monitor the effective sampling size (ESS) at each iteraion, given by the following.

\begin{eqnarray} ESS := \frac{\left(\sum_{i=1}^N w^i\right)^2}{\sum_{i=1}^N (w^i)^2 }, \end{eqnarray}

which is maximized to $N$ if all weights are equal ($1/N$), and minimized to $1$ if one particle has all of the mass. The strategy is to resample only when the ESS is smaller than $N/2$.

Another view of SMC

Alternatively, we can view SMC as an algorithm using importance sampling to first approximate $\pi_1$, then approximate $\pi_2$, etc. Let us denote these intermediate approximations by $\tilde \pi_t$ for the $t^{th}$ approximation.

Question: what proposal should we use?

Idea: to construct the proposal, use $\tilde \pi_{t-1}$ and a transition proposal $q(x_t|x_{t-1})$, where we assume that we can sample from $q$ and evaluate $q(x_t|x_{t-1})$ with the correct normalization.

Details: See slides.

SMC Samplers: SMC on general spaces

We lift the assumptions that $F_t = E_1 \times E_2 \times \dots \times E_t$. Let us now assume that $F_t$ is an arbitrary space.

Motivations:

Phylogenetic inference on non-clock trees. See the slides and The Phylogenetic Handbook by Lemey, Salemi, & Vandamme.
Annealing, $\pi_t = p(x) (\ell(y | x))^{\alpha_t}$, where $\alpha_t$ is increasing to $\alpha_n = 1$. See Annealed Importance Sampling by Radford Neal.

Why do we need special consideration for non-product space? Example: overcounting in discrete state spaces. See the slides for an example where there is more than one way to build the same tree structure.

Solution: For the purpose of analysis, we build auxiliary spaces and distributions as follows.

$\tilde F_t = F_1 \times F_2 \times \dots \times F_t$
construct a backward transition model, $q^-(x_{t-1}|x_t)$ for $x_i \in F_i$ (more on that soon)
for any $x_{1:t}$, $x_i \in F_i$, set $\tilde \gamma_t(x_{1:t})$ as follows:

\begin{eqnarray} \tilde \gamma_t(x_{1:t}) = \gamma_t(x_t) q^-(x_{t-1}|x_t) q^-(x_{t-2}|x_{t-1}) \dots q^-(x_{1}|x_2) \end{eqnarray}

The resulting framework is called a SMC sampler. For more information see Section 3 of the Sequential Monte Carlo Samplers paper by Del Moral, Doucet & Jasra.

Exercises:

Marginalize $\tilde \pi_t(x_{1:t})$ over $x_{1:t-1}$, where $\tilde \pi_t \propto \tilde \gamma_t$.
Since $\tilde \gamma_t$ is now defined over a product space $\tilde F_t$, we can use standard SMC on that auxiliary construction. Find the weight update of a standard SMC algorithm targeting $\tilde \gamma_t$.