Lecture 7: Sequential Monte Carlo

15 Mar 2015

Instructor: Alexandre Bouchard-Côté
Editor: TBA

Review of importance sampling (IS)

IS with known normalization constants.
IS without the normalization constants (self-normalizing IS).

Sequential Monte Carlo (SMC) on product spaces

Examples

Let us start with some examples where SMC is useful in Bayesian statistics:

State space models from assignment 1:
- $n$: time index (day)
- Observed number of text messages: $y_{1:n} = (y_1, \dots, y_n)$, $y_i \in \{0, 1, 2, \dots\}$
- Latent category $x_{1:n}$, $x_i \in \{0, 1\}$ (note: $x$ was denoted $z$ earlier on)
Genomics:
- $n$: positions on a chromosome.
- Observed single nucleotide polymorphisms (SNP): $y_{1:n}$, $y_i \in \{\textrm{AA}, \textrm{Aa}, \textrm{aa}\}$
- Latent haploblock. An haploblock is a chunk of the genome with SNP states shared by several individuals. Since there are not too many recombinations, there are well documented haploblocks available for each position $i$, $z_i \in E_i$, where $E_i$ is some discrete set.
Ultrametric (clock) phylogenetic trees.
- Species: $S = \{1, 2, \dots, s\}$.
- $E_i$ contains a partition of $S$ into $s-i$ blocks (to encode the topology of the tree after the $i$-th speciation event), and a real number (the speciation time).

Common feature: the latent space is a product space $F_t = E_1 \times E_2 \times \dots \times E_t$ indexed by the integers $t\in\{1, \dots, n\}$.

Notes:

We may only care about the probability $\pi = \pi_n$ defined on $F = F_n$, called the target; the other ones ($t < n$) are called intermediate.
This setup was historically the motivation for SMC methods.
However, it was discovered in the 2000's that SMC also applies to situations where this is not the case. But let us start by assuming $F$ is a product space, we will get to the general construction later on.

Notation and goal

Target distribution, with density $\pi(x) = \gamma(x)/Z$ (note: $Z$ was denoted $C$ last time; $x$ was $z$)

Goals:

Computing $Z$ (e.g. to perform model selection)
Computing $\int f(x) \pi(x) \ud x$, where $f$ is a test function.

For example, in the context of Bayesian inference, $Z = m(\textrm{data})$, the marginal likelihood studied last week, and if $f_i(x_{1:t}) = x_i$, then $\int f_i(x) \pi_{1:t}(x) \ud x$ is the posterior mean $\E[X_i | Z_{1:t}]$.

Sequential Importance Sampling (SIS)

Based on two simple identities:

\begin{eqnarray} \gamma(x_{1:n}) = \frac{\gamma(x_{1:n})}{\gamma(x_{1:n-1})} \frac{\gamma(x_{1:n-1})}{\gamma(x_{1:n-2})} \dots \frac{\gamma(x_{1:1})}{\gamma(x_{\emptyset})}, \end{eqnarray}

and

\begin{eqnarray} q(x_{1:n}) = q(x_1 | x_\emptyset) q(x_2 | x_{1:1}) q(x_3 | x_{1:2}) \dots q(x_n | x_{1:n-1}), \end{eqnarray}

we can write an iterative version of importance sampling, where at each iteration $t = 1, 2, \dots, n$, we carry a population of particles $x_{1:t}^1, \dots, x_{1:t}^N$ with corresponding unnormalized weights $w_t^1, \dots, w_t^N$ (see slides).

In SIS, we propose incrementally:

\begin{eqnarray} x_t^i &\sim q(\cdot | x_{1:t-1}^i) \\ x_{1:t} &= (x_{1:t-1}^i, x_t^i), \end{eqnarray}

and update the weights using:

\begin{eqnarray} w_t^i &= w_{t-1}^i \frac{\gamma(x_{1:t})}{\gamma(x_{1:t-1})} \frac{1}{q(x_t | x_{1:t-1})}. \end{eqnarray}

Exercise: compute the weights at the last iteration, $w_n^i$. What is the implication?

The sequential nature of the particle recursions makes it tempting to use SIS in an online setting. This is a bad idea! As we will see, the approximation fails exponentially fast in the number of time steps $n$. Symptom: all the normalized weights converge to 0 except for 1 particle which takes all the normalized weight.

Examples of weight updates:

HMM. (exercise)
A coalescent model.