Design of MCMC samplers
Alexandre Bouchard-Côté
How MCMC methods are designed..
Goal build a Markov chain with transitions \(K\) such that:
- \(K\) is satisfies global balance with respect to the posterior \(\pi\) (\(K\) is “\(\pi\)-invariant”)
- \(K\) is irreducible (for each \(x\), \(x'\), there is a number of steps \(n\) such that we can jump from \(x\) to \(x'\) in \(n\) steps, mathematically, \(K^n(x'|x) > 0\))
If both conditions are satisfied, then we have a law of large numbers.
Main tools used to construct efficient algorithms satisfying conditions (1) and (2) above:
- Metropolis-Hastings algorithm (introduced last time + current exercise)
- Auxiliary variables/augmented targets (examples: model saturation, slice sampling)
- Mixture and Alternation of kernels (focus of this topic)
- Marginalization (opposite of augmentation; often less useful than above 3 as problem specific and can lead to high per-iteration cost)
Mixture and alternation: example
- Suppose you have a graphical model with two nodes, \(x_1\) and \(x_2\), \(x = (x_1, x_2)\)
- You have a first proposal that changes only \(x_1\). Denote the associated Metropolis-Hastings algorithm by \(K_1\)
- You have a second proposal that changes only \(x_2\). Denote the associated Metropolis-Hastings algorithm by \(K_2\)
Poll: which condition(s) \(K_1\) alone satisfies, if any
- \(\pi\)-invariance and irreducibility
- \(\pi\)-invariance only
- irreducibility only
- None
Mixing proposals
“Mixing proposals” consists in the following algorithm (applied iteratively)
- suppose you are at state \(x\)
- flip a coin with fixed bias \(p\)
- if coin is heads: return \(x' \sim K_1(\cdot|x)\)
- if coin is tails: return \(x' \sim K_2(\cdot|x)\)
Notation \(K = p K_1 + (1-p) K_2\)
Key result if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their mixture \(K\) is \(\pi\)-invariant
Take a minute to prove this result. Convince yourself at the same time that the argument crucially depends on \(p\) not varying with the current state.
Proof
\[
\begin{align*}
\sum_{x} \pi(x) K(x'|x) &= \sum_{x} \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\
&= p \sum_{x} \pi(x) K_1(x'|x) + (1-p) \sum_{x} \pi(x) K_2(x'|x)) \\
&= p \pi(x') + (1-p) \pi(x') = \pi(x')
\end{align*}
\]
Note we can also show that if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) also does:
\[
\begin{align*}
\pi(x) K(x'|x) &= \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\
&= p \pi(x) K_1(x'|x) + (1-p) \pi(x) K_2(x'|x) \\
&= p \pi(x') K_1(x|x') + (1-p) \pi(x') K_2(x|x') \\
&= \pi(x') K(x|x') \\
\end{align*}
\]
Alternation proposals
“Alternation proposals” consists in the following algorithm (applied iteratively)
- suppose you are at state \(x\)
- let \(x' \sim K_1(\cdot|x)\)
- return \(x'' \sim K_2(\cdot | x')\)
Notation \(K = K_1 K_2\)
Key result if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their alternation \(K\) is \(\pi\)-invariant
Take a minute to prove this result. Hint: the notation \(K_1 K_2\) is inspired by matrix multiplication.
Proof
Note: \(K(x'' | x) = \sum_{x'} K_1(x'|x) K_2(x''|x')\) by the law of total probability. Hence:
\[
\begin{align*}
\sum_{x} \pi(x) K(x''|x) &= \sum_{x} \pi(x) \sum_{x'} K_1(x'|x) K_2(x''|x') \\
&= \sum_{x'} K_2(x''|x') \sum_{x} \pi(x) K_1(x'|x) \\
&= \sum_{x'} K_2(x''|x') \pi(x') = \pi(x'') \\
\end{align*}
\]
Note in contrast to mixtures, even if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) may not necessarily satisfy detailed balance
- Almost all known non-reversible MCMC algorithm (i.e. those that satisfy global but not detailed balance) exploit this phenomenon, and can be written as an alternation of kernels (i.e. the kernels individually are reversible (satisfy detailed balance), but the alternation is non-reversible)
- Counter-example where alternating two reversible kernels lead to a non-reversible kernel:
- state space is the permutation of \(\{a, b, c\}\)
- \(\pi\): uniform on the \(6!\) elements
- \(K_1\): swap the first two elements
- \(K_2\): swap the last two elements
- \(K = K_1 K_2\)
- let \(x = (a, b, c)\)
- let \(x' = (b, c, a)\)
- claim: \(\pi(x) K(x'|x) > 0\) but \(\pi(x') K(x|x') = 0\).
The surprinsing richness of the “MCMC methodology game”
There is a very small number of rules:
- The Metropolis-Hastings algorithm is \(\pi\)-invariant
- Auxiliary variables/augmented targets
- Mixture and Alternation of kernels preserve \(\pi\)-invariance
- Marginalization
But a huge number of MCMC samplers can be constructed based on these very simple rules
- Slice samplers
- Metropolis-with-Gibbs (earlier example with \(x = (x_1, x_2)\))
- Parallel and serial tempering (next topic)
- Hamiltonian Monte Carlo
- Reversible jump MCMC (soon)
- Many more!
Optional readings
For a good introduction to MCMC methodologies, see chapter 1–3 in Geyer’s notes