Alexandre Bouchard-Côté

**Goal** build a Markov chain with transitions \(K\) such that:

- \(K\) is satisfies global balance with respect to the posterior \(\pi\) (\(K\) is “\(\pi\)-invariant”)
- \(K\) is irreducible (for each \(x\), \(x'\), there is a number of steps \(n\) such that we can jump from \(x\) to \(x'\) in \(n\) steps, mathematically, \(K^n(x'|x) > 0\))

If both conditions are satisfied, then we have a law of large numbers.

**Main tools** used to construct efficient algorithms satisfying conditions (1) and (2) above:

- Metropolis-Hastings algorithm (introduced last time + current exercise)
- Auxiliary variables/augmented targets (examples: model saturation, slice sampling)
*Mixture*and*Alternation*of kernels (focus of this topic)- Marginalization (opposite of augmentation; often less useful than above 3 as problem specific and can lead to high per-iteration cost)

- Suppose you have a graphical model with two nodes, \(x_1\) and \(x_2\), \(x = (x_1, x_2)\)
- You have a first proposal that changes only \(x_1\). Denote the associated Metropolis-Hastings algorithm by \(K_1\)
- You have a second proposal that changes only \(x_2\). Denote the associated Metropolis-Hastings algorithm by \(K_2\)

- \(\pi\)-invariance and irreducibility
- \(\pi\)-invariance only
- irreducibility only
- None

“Mixing proposals” consists in the following algorithm (applied iteratively)

- suppose you are at state \(x\)
- flip a coin with fixed bias \(p\)
- if coin is heads: return \(x' \sim K_1(\cdot|x)\)
- if coin is tails: return \(x' \sim K_2(\cdot|x)\)

**Notation** \(K = p K_1 + (1-p) K_2\)

**Key result** if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their mixture \(K\) is \(\pi\)-invariant

**Take a minute to prove this result.** Convince yourself at the same time that the argument crucially depends on \(p\) not varying with the current state.

\[ \begin{align*} \sum_{x} \pi(x) K(x'|x) &= \sum_{x} \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\ &= p \sum_{x} \pi(x) K_1(x'|x) + (1-p) \sum_{x} \pi(x) K_2(x'|x)) \\ &= p \pi(x') + (1-p) \pi(x') = \pi(x') \end{align*} \]

**Note** we can also show that if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) also does:

\[ \begin{align*} \pi(x) K(x'|x) &= \pi(x) (p K_1(x'|x) + (1-p) K_2(x'|x)) \\ &= p \pi(x) K_1(x'|x) + (1-p) \pi(x) K_2(x'|x) \\ &= p \pi(x') K_1(x|x') + (1-p) \pi(x') K_2(x|x') \\ &= \pi(x') K(x|x') \\ \end{align*} \]

“Alternation proposals” consists in the following algorithm (applied iteratively)

- suppose you are at state \(x\)
- let \(x' \sim K_1(\cdot|x)\)
- return \(x'' \sim K_2(\cdot | x')\)

**Notation** \(K = K_1 K_2\)

**Key result** if both \(K_1\) and \(K_2\) are \(\pi\)-invariant, then their alternation \(K\) is \(\pi\)-invariant

**Take a minute to prove this result.** Hint: the notation \(K_1 K_2\) is inspired by matrix multiplication.

Note: \(K(x'' | x) = \sum_{x'} K_1(x'|x) K_2(x''|x')\) by the law of total probability. Hence:

\[ \begin{align*} \sum_{x} \pi(x) K(x''|x) &= \sum_{x} \pi(x) \sum_{x'} K_1(x'|x) K_2(x''|x') \\ &= \sum_{x'} K_2(x''|x') \sum_{x} \pi(x) K_1(x'|x) \\ &= \sum_{x'} K_2(x''|x') \pi(x') = \pi(x'') \\ \end{align*} \]

**Note** in contrast to mixtures, even if \(K_1\) and \(K_2\) satisfy detailed balance, then \(K\) may not necessarily satisfy detailed balance

- Almost all known
*non-reversible*MCMC algorithm (i.e. those that satisfy global but not detailed balance) exploit this phenomenon, and can be written as an alternation of kernels (i.e. the kernels individually are reversible (satisfy detailed balance), but the alternation is non-reversible) - Counter-example where alternating two reversible kernels lead to a non-reversible kernel:
- state space is the permutation of \(\{a, b, c\}\)
- \(\pi\): uniform on the \(6!\) elements
- \(K_1\): swap the first two elements
- \(K_2\): swap the last two elements
- \(K = K_1 K_2\)
- let \(x = (a, b, c)\)
- let \(x' = (b, c, a)\)
- claim: \(\pi(x) K(x'|x) > 0\) but \(\pi(x') K(x|x') = 0\).

There is a very small number of rules:

- The Metropolis-Hastings algorithm is \(\pi\)-invariant
- Auxiliary variables/augmented targets
*Mixture*and*Alternation*of kernels preserve \(\pi\)-invariance- Marginalization

But a huge number of MCMC samplers can be constructed based on these very simple rules

- Slice samplers
- Metropolis-with-Gibbs (earlier example with \(x = (x_1, x_2)\))
- Parallel and serial tempering (next topic)
- Hamiltonian Monte Carlo
- Reversible jump MCMC (soon)
- Many more!

For a good introduction to MCMC methodologies, see chapter 1–3 in Geyer’s notes