MH and slice exercises solutions

Alexandre Bouchard-Côté

Solution: Using linear algebra instead of simulating a Markov chain

Question

You were asked to explain why the code below can be used to compute the stationary distribution of a discrete chain

compute_stationary_distribution <- function(matrix) {
  eig <- eigen(t(M))
  eigenvalues = eig$values
  i <- which.max(Re(eigenvalues[abs(Im(eigenvalues)) < 1e-6]))
  statio <- (eig$vectors)[,i]
  statio <- statio / sum(statio)
  return(statio)
}

M = matrix(c(
  0,   1, 0,
  1/2, 0, 1/2,
  0,   1, 0
), nrow=3, ncol=3, byrow=T)

barplot(compute_stationary_distribution(M))

Solution

The function eigen() computes an eigen decomposition of the transpose of the transition matrix \(M\), while which.max(..) finds the largest eigenvalue \(\lambda\) with no imaginary part (up to numerical precision), and corresponding eigenvector \(v\). You will find that in all the places where we applied the function compute_stationary_distribution(), the largest such eigenvalue was \(\lambda = 1\) (this is not a coincidence, as a consequence of a celebrated linear algebra result, the Perron-Frobenius theorem, for matrices that are formed by transition probabilities of a Markov chain, under weak conditions there is a unique such maximum eigenvalue \(\lambda = 1\) which can be paired up with a probability vector \(v\) without loss of generality). From the definition of eigenvalue and eigenvector, we obtain \[v = \lambda v = v M,\] which coincides with the global balance equation introduced in class by setting \(v = \pi\). Note that the code requires to normalize the output by sum(statio) since an eigenvector is only specified up to a proportionality constant.

Global vs. detail balance exercise

Solution: showing that detailed balance/reversibility implies global balance

Detailed balance states that for all \(i,j\),

\[\pi_i M_{i,j} = \pi_j M_{j, i}.\]

Sum each side over \(i\):

\[\sum_i \pi_i M_{i,j} = \sum_i \pi_j M_{j, i}.\]

Now the right hand side simplified to

\[\sum_i \pi_j M_{j, i} = \pi_j \sum_i M_{j, i} = \pi_j\]

Hence

\[\sum_i \pi_i M_{i,j} = \pi_j\]

which is the definition of global balance.

Solution: showing that the other way around is not true

Consider \(\pi = (1/3, 1/3, 1/3)\), and

\[M_{i,j} = \left(\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{matrix}\right). \]

We have \[\sum_i M_{i,j} / 3 = 1/3,\]

so that global balance is satisfied, yet \(\pi_1 M_{1,2} = 1/3\) while \(\pi_2 M_{2,1} = 0\) meaning that detailed balance is not satisfied.

Quick overview of (finite) Markov chain theory

Optional exercise: prove the LLN for Markov chain in this simplified setup.

I outline the proof here for the case where we an MCMC algorithm on 3 states, based on a proposal with probability \(1/2\) for each of left and right neighbour states. For a full proof described in a more general setup, see https://www.math.wisc.edu/~roch/grad-prob/gradprob-notes23.pdf

Solution, part 1: Simplification of the test function

In the statement of the law of large number,

\[\frac{1}{n} \sum_{i=1}^n f(X_i) \to \frac{f(0) + f(1) + f(2)}{3} \;\;\text{(a.s.)}\]

let us pick without loss of generality \(f(x) = {{\bf 1}}[x = x_0]\) (for other function, express it as a sum of such indicator). Let us say \({{\bf 1}}[x = 1]\) for concreteness (indicator on the middle state). Hence we want to show:

\[\frac{1}{n} \sum_{i=1}^n {{\bf 1}}[X_i = 1] \to \frac{1}{3} \;\;\text{(a.s.)}\]

Part 2: excursion

Define an ‘excursions’ as a subsequences \([t, t')\) of the Markov chain times where you start in step \(t\) at state \(i=1\), end in step \(t'\) at \(i=1\), and there are no intermediate states in \((t, t')\) where you visit \(i=1\) (i.e. \(t'\) is the first return to \(i\) after \(t\)).

Example: if the chains started at \(0\) visits \(X_1 = 0, X_2 = 1, X_3 = 2, X_4 = 2, X_5 = 1, X_6 = 2, X_7 = 1, \dots\), then \([2, 5)\) is the first excursions, and \([5,7)\), the second one.

The first key observation is that the length of the first and second excursions are independent. In fact, the length of all excursions are independent (from the Markov property).

The second key observation is that it is enough to show that the mean excursion length is \(3\), since the frequency of a visit is the inverse of the length between visits (see https://www.math.wisc.edu/~roch/grad-prob/gradprob-notes23.pdf for details)

Part 3: Length of an excursion

In this case, an the length of an excursion is given by the time to transition out of the middle state (one step), plus the time to go back (a geometric of success probability \(1/2\), hence of mean \(2\)). Hence we have that the mean excursion length is indeed \(3\) as required.

Fun fact (not needed for this question): more generally, the length of an excursion in discete state spaces is given by \(1/\pi(x)\).

Showing detailed balance

Solution: expression for the MH kernel

Let \(x\) denote the input of the MH algorithm, and \(x'\), the output. Let also denote the proposal by \(x''\). Recall one step of the algorithm, in the context of this exercise, is given by:

  1. Propose \(x'' \sim q(\cdot|x)\)
  2. If \(x'' \in A\), return \(x' = x''\)
  3. else return \(x' = x\)

Let \(K(x'|x)\) denote the probability of outputting \(x'\) given a start point \(x\). We start by writing an expression for \(K\), based on the extra simplifying assumption that \(q(x'|x) = 0\).

Suppose first \(x' \neq x\), using the law of total probability, we have

\[ \begin{align*} K(x'|x) = {\mathbb{P}}(X' = x' | X = x) &= \sum_{x''} {\mathbb{P}}(X' = x', X'' = x'' | X = x) \\ &= \sum_{x''} \underbrace{{\mathbb{P}}(X'' = x'' | X = x)}_{q(x'' | x)} \underbrace{{\mathbb{P}}(X' = x' | X = x, X''= x'')}_{{{\bf 1}}[x'' = x'] {{\bf 1}}[x' \in A]} \\ &= q(x' | x) {{\bf 1}}[x' \in A] \\ \end{align*} \]

Next, we have:

\[ \begin{align*} K(x|x) &= \sum_{x''} {\mathbb{P}}(X' = x, X'' = x'' | X = x) \\ &= \sum_{x''} \underbrace{{\mathbb{P}}(X'' = x'' | X = x)}_{q(x'' | x)} \underbrace{{\mathbb{P}}(X' = x | X = x, X''= x'')}_{{{\bf 1}}[x'' \notin A]}. \end{align*} \]

Solution: showing detailed balance

Again we start with the case \(x \neq x'\).

\[ \begin{align*} \pi(x) K(x' | x) &= \frac{{{\bf 1}}[x\in A]}{|A|} q(x' | x) {{\bf 1}}[x' \in A] \\ &= \frac{{{\bf 1}}[x'\in A]}{|A|} q(x | x') {{\bf 1}}[x \in A] \\ &= \pi(x') K(x | x') \end{align*} \]

For \(x = x'\), detailed balance automatically holds.

MH exercises

Solution: invariance of \(K_1\)

We start by showing that the first “move”, given by

\[ K_1(x',z'|x,z) = {{\bf 1}}[x = x'] q(z'|x) \] is \(\tilde \pi\)-invariant, by showing it satisfies detailed balance:

\[ \begin{align*} \tilde \pi(x,z) K_1(x',z'|x,z) &= \pi(x) q(z | x) {{\bf 1}}[x = x'] q(z'|x) \\ &= \pi(x') q(z | x') {{\bf 1}}[x = x'] q(z'|x') \\ &= \pi(x') q(z'|x') {{\bf 1}}[x' = x] q(z | x') \\ &= \tilde \pi(x',z') K_1(x,z|x',z'). \end{align*} \]

Solution: invariance of \(K_2\)

As for \(K_2\), since it is a proposal based on a symmetric proposal, \(q(x',z'|x,z) = {{\bf 1}}[x'=z] {{\bf 1}}[z'=x]\), we already have from a result shown in class that \(K_2\) is \(\tilde \pi\)-invariant.

Solution: invariance of MH

Finally, since MH is a deterministic alternation of \(K_1\) and \(K_2\), we obtain that MH is also \(\tilde \pi\)-invariant.