Lecture 7: Sequential Monte Carlo
Instructor: Alexandre Bouchard-Côté
Editor: TBA
Review of importance sampling (IS)
- IS with known normalization constants.
- IS without the normalization constants (self-normalizing IS).
Sequential Monte Carlo (SMC) on product spaces
Examples
Let us start with some examples where SMC is useful in Bayesian statistics:
- State space models from assignment 1:
- $n$: time index (day)
- Observed number of text messages: $y_{1:n} = (y_1, \dots, y_n)$, $y_i \in \{0, 1, 2, \dots\}$
- Latent category $x_{1:n}$, $x_i \in \{0, 1\}$ (note: $x$ was denoted $z$ earlier on)
- Genomics:
- $n$: positions on a chromosome.
- Observed single nucleotide polymorphisms (SNP): $y_{1:n}$, $y_i \in \{\textrm{AA}, \textrm{Aa}, \textrm{aa}\}$
- Latent haploblock. An haploblock is a chunk of the genome with SNP states shared by several individuals. Since there are not too many recombinations, there are well documented haploblocks available for each position $i$, $z_i \in E_i$, where $E_i$ is some discrete set.
- Ultrametric (clock) phylogenetic trees.
- Species: $S = \{1, 2, \dots, s\}$.
- $E_i$ contains a partition of $S$ into $s-i$ blocks (to encode the topology of the tree after the $i$-th speciation event), and a real number (the speciation time).
Common feature: the latent space is a product space $F_t = E_1 \times E_2 \times \dots \times E_t$ indexed by the integers $t\in\{1, \dots, n\}$.
Notes:
- We may only care about the probability $\pi = \pi_n$ defined on $F = F_n$, called the target; the other ones ($t < n$) are called intermediate.
- This setup was historically the motivation for SMC methods.
- However, it was discovered in the 2000's that SMC also applies to situations where this is not the case. But let us start by assuming $F$ is a product space, we will get to the general construction later on.
Notation and goal
Target distribution, with density $\pi(x) = \gamma(x)/Z$ (note: $Z$ was denoted $C$ last time; $x$ was $z$)
Goals:
- Computing $Z$ (e.g. to perform model selection)
- Computing $\int f(x) \pi(x) \ud x$, where $f$ is a test function.
For example, in the context of Bayesian inference, $Z = m(\textrm{data})$, the marginal likelihood studied last week, and if $f_i(x_{1:t}) = x_i$, then $\int f_i(x) \pi_{1:t}(x) \ud x$ is the posterior mean $\E[X_i | Z_{1:t}]$.
Sequential Importance Sampling (SIS)
Based on two simple identities:
\begin{eqnarray} \gamma(x_{1:n}) = \frac{\gamma(x_{1:n})}{\gamma(x_{1:n-1})} \frac{\gamma(x_{1:n-1})}{\gamma(x_{1:n-2})} \dots \frac{\gamma(x_{1:1})}{\gamma(x_{\emptyset})}, \end{eqnarray}
and
\begin{eqnarray} q(x_{1:n}) = q(x_1 | x_\emptyset) q(x_2 | x_{1:1}) q(x_3 | x_{1:2}) \dots q(x_n | x_{1:n-1}), \end{eqnarray}
we can write an iterative version of importance sampling, where at each iteration $t = 1, 2, \dots, n$, we carry a population of particles $x_{1:t}^1, \dots, x_{1:t}^N$ with corresponding unnormalized weights $w_t^1, \dots, w_t^N$ (see slides).
In SIS, we propose incrementally:
\begin{eqnarray} x_t^i &\sim q(\cdot | x_{1:t-1}^i) \\ x_{1:t} &= (x_{1:t-1}^i, x_t^i), \end{eqnarray}
and update the weights using:
\begin{eqnarray} w_t^i &= w_{t-1}^i \frac{\gamma(x_{1:t})}{\gamma(x_{1:t-1})} \frac{1}{q(x_t | x_{1:t-1})}. \end{eqnarray}
Exercise: compute the weights at the last iteration, $w_n^i$. What is the implication?
The sequential nature of the particle recursions makes it tempting to use SIS in an online setting. This is a bad idea! As we will see, the approximation fails exponentially fast in the number of time steps $n$. Symptom: all the normalized weights converge to 0 except for 1 particle which takes all the normalized weight.
Examples of weight updates:
- HMM. (exercise)
- A coalescent model.