How MCMC methods are designed..

Goal build a Markov chain with transitions \(K\) such that:

\(K\) is satisfies global balance with respect to the posterior \(\pi\) (\(K\) is “\(\pi\)-invariant”)
\(K\) is irreducible (for each \(x\), \(x'\), there is a number of steps \(n\) such that we can jump from \(x\) to \(x'\) in \(n\) steps, mathematically, \(K^n(x'|x) > 0\))

If both conditions are satisfied, then we have a law of large numbers.

Main tools used to construct efficient algorithms satisfying conditions (1) and (2) above:

Metropolis-Hastings algorithm (introduced last time + current exercise)
Auxiliary variables/augmented targets (examples: model saturation, slice sampling)
Mixture and Alternation of kernels (focus of this topic)
Marginalization (opposite of augmentation; often less useful than above 3 as problem specific and can lead to high per-iteration cost)

Mixture and alternation: example

Suppose you have a graphical model with two nodes, \(x_1\) and \(x_2\), \(x = (x_1, x_2)\)
You have a first proposal that changes only \(x_1\). Denote the associated Metropolis-Hastings algorithm by \(K_1\)
You have a second proposal that changes only \(x_2\). Denote the associated Metropolis-Hastings algorithm by \(K_2\)

Poll: which condition(s) \(K_1\) alone satisfies, if any

\(\pi\)-invariance and irreducibility
\(\pi\)-invariance only
irreducibility only
None

Mixing proposals

“Mixing proposals” consists in the following algorithm (applied iteratively)

suppose you are at state \(x\)
flip a coin with fixed bias \(p\)
- if coin is heads: return \(x' \sim K_1(\cdot|x)\)
- if coin is tails: return \(x' \sim K_2(\cdot|x)\)

Notation \(K = p K_1 + (1-p) K_2\)

Key result if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their mixture \(K\) is \(\pi\)-invariant

Take a minute to prove this result. Convince yourself at the same time that the argument crucially depends on \(p\) not varying with the current state.

Proof

\[ \begin{align*} \sum_{x} \pi(x) K(x'|x) &= \sum_{x} \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\ &= p \sum_{x} \pi(x) K_1(x'|x) + (1-p) \sum_{x} \pi(x) K_2(x'|x)) \\ &= p \pi(x') + (1-p) \pi(x') = \pi(x') \end{align*} \]

Note we can also show that if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) also does:

\[ \begin{align*} \pi(x) K(x'|x) &= \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\ &= p \pi(x) K_1(x'|x) + (1-p) \pi(x) K_2(x'|x) \\ &= p \pi(x') K_1(x|x') + (1-p) \pi(x') K_2(x|x') \\ &= \pi(x') K(x|x') \\ \end{align*} \]

Alternation proposals

“Alternation proposals” consists in the following algorithm (applied iteratively)

suppose you are at state \(x\)
let \(x' \sim K_1(\cdot|x)\)
return \(x'' \sim K_2(\cdot | x')\)

Notation \(K = K_1 K_2\)

Key result if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their alternation \(K\) is \(\pi\)-invariant

Take a minute to prove this result. Hint: the notation \(K_1 K_2\) is inspired by matrix multiplication.

Proof

Note: \(K(x'' | x) = \sum_{x'} K_1(x'|x) K_2(x''|x')\) by the law of total probability. Hence:

\[ \begin{align*} \sum_{x} \pi(x) K(x''|x) &= \sum_{x} \pi(x) \sum_{x'} K_1(x'|x) K_2(x''|x') \\ &= \sum_{x'} K_2(x''|x') \sum_{x} \pi(x) K_1(x'|x) \\ &= \sum_{x'} K_2(x''|x') \pi(x') = \pi(x'') \\ \end{align*} \]

Note in contrast to mixtures, even if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) may not necessarily satisfy detailed balance

Almost all known non-reversible MCMC algorithm (i.e. those that satisfy global but not detailed balance) exploit this phenomenon, and can be written as an alternation of kernels (i.e. the kernels individually are reversible (satisfy detailed balance), but the alternation is non-reversible)
Counter-example where alternating two reversible kernels lead to a non-reversible kernel:
- state space is the permutation of \(\{a, b, c\}\)
- \(\pi\): uniform on the \(6!\) elements
- \(K_1\): swap the first two elements
- \(K_2\): swap the last two elements
- \(K = K_1 K_2\)
- let \(x = (a, b, c)\)
- let \(x' = (b, c, a)\)
- claim: \(\pi(x) K(x'|x) > 0\) but \(\pi(x') K(x|x') = 0\).

The surprinsing richness of the “MCMC methodology game”

There is a very small number of rules:

The Metropolis-Hastings algorithm is \(\pi\)-invariant
Auxiliary variables/augmented targets
Mixture and Alternation of kernels preserve \(\pi\)-invariance
Marginalization

But a huge number of MCMC samplers can be constructed based on these very simple rules

Slice samplers
Metropolis-with-Gibbs (earlier example with \(x = (x_1, x_2)\))
Parallel and serial tempering (next topic)
Hamiltonian Monte Carlo
Reversible jump MCMC (soon)
Many more!

Design of MCMC samplers

How MCMC methods are designed..

Mixture and alternation: example

Poll: which condition(s) \(K_1\) alone satisfies, if any

Mixing proposals

Proof

Alternation proposals

Proof

The surprinsing richness of the “MCMC methodology game”

Optional readings