Alexandre Bouchard-Côté
You were asked to explain why the code below can be used to compute the stationary distribution of a discrete chain
compute_stationary_distribution <- function(matrix) {
eig <- eigen(t(M))
eigenvalues = eig$values
i <- which.max(Re(eigenvalues[abs(Im(eigenvalues)) < 1e-6]))
statio <- (eig$vectors)[,i]
statio <- statio / sum(statio)
return(statio)
}
M = matrix(c(
0, 1, 0,
1/2, 0, 1/2,
0, 1, 0
), nrow=3, ncol=3, byrow=T)
barplot(compute_stationary_distribution(M))
The function eigen()
computes an eigen decomposition of the transpose of the transition matrix \(M\), while which.max(..)
finds the largest eigenvalue \(\lambda\) with no imaginary part (up to numerical precision), and corresponding eigenvector \(v\). You will find that in all the places where we applied the function compute_stationary_distribution()
, the largest such eigenvalue was \(\lambda = 1\) (this is not a coincidence, as a consequence of a celebrated linear algebra result, the Perron-Frobenius theorem, for matrices that are formed by transition probabilities of a Markov chain, under weak conditions there is a unique such maximum eigenvalue \(\lambda = 1\) which can be paired up with a probability vector \(v\) without loss of generality). From the definition of eigenvalue and eigenvector, we obtain \[v = \lambda v = v M,\] which coincides with the global balance equation introduced in class by setting \(v = \pi\). Note that the code requires to normalize the output by sum(statio)
since an eigenvector is only specified up to a proportionality constant.
Detailed balance states that for all \(i,j\),
\[\pi_i M_{i,j} = \pi_j M_{j, i}.\]
Sum each side over \(i\):
\[\sum_i \pi_i M_{i,j} = \sum_i \pi_j M_{j, i}.\]
Now the right hand side simplified to
\[\sum_i \pi_j M_{j, i} = \pi_j \sum_i M_{j, i} = \pi_j\]
Hence
\[\sum_i \pi_i M_{i,j} = \pi_j\]
which is the definition of global balance.
Consider \(\pi = (1/3, 1/3, 1/3)\), and
\[M_{i,j} = \left(\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{matrix}\right). \]
We have \[\sum_i M_{i,j} / 3 = 1/3,\]
so that global balance is satisfied, yet \(\pi_1 M_{1,2} = 1/3\) while \(\pi_2 M_{2,1} = 0\) meaning that detailed balance is not satisfied.
I outline the proof here for the case where we an MCMC algorithm on 3 states, based on a proposal with probability \(1/2\) for each of left and right neighbour states. For a full proof described in a more general setup, see https://www.math.wisc.edu/~roch/grad-prob/gradprob-notes23.pdf
In the statement of the law of large number,
\[\frac{1}{n} \sum_{i=1}^n f(X_i) \to \frac{f(0) + f(1) + f(2)}{3} \;\;\text{(a.s.)}\]
let us pick without loss of generality \(f(x) = {{\bf 1}}[x = x_0]\) (for other function, express it as a sum of such indicator). Let us say \({{\bf 1}}[x = 1]\) for concreteness (indicator on the middle state). Hence we want to show:
\[\frac{1}{n} \sum_{i=1}^n {{\bf 1}}[X_i = 1] \to \frac{1}{3} \;\;\text{(a.s.)}\]
Define an ‘excursions’ as a subsequences \([t, t')\) of the Markov chain times where you start in step \(t\) at state \(i=1\), end in step \(t'\) at \(i=1\), and there are no intermediate states in \((t, t')\) where you visit \(i=1\) (i.e. \(t'\) is the first return to \(i\) after \(t\)).
Example: if the chains started at \(0\) visits \(X_1 = 0, X_2 = 1, X_3 = 2, X_4 = 2, X_5 = 1, X_6 = 2, X_7 = 1, \dots\), then \([2, 5)\) is the first excursions, and \([5,7)\), the second one.
The first key observation is that the length of the first and second excursions are independent. In fact, the length of all excursions are independent (from the Markov property).
The second key observation is that it is enough to show that the mean excursion length is \(3\), since the frequency of a visit is the inverse of the length between visits (see https://www.math.wisc.edu/~roch/grad-prob/gradprob-notes23.pdf for details)
In this case, an the length of an excursion is given by the time to transition out of the middle state (one step), plus the time to go back (a geometric of success probability \(1/2\), hence of mean \(2\)). Hence we have that the mean excursion length is indeed \(3\) as required.
Fun fact (not needed for this question): more generally, the length of an excursion in discete state spaces is given by \(1/\pi(x)\).
Let \(x\) denote the input of the MH algorithm, and \(x'\), the output. Let also denote the proposal by \(x''\). Recall one step of the algorithm, in the context of this exercise, is given by:
Let \(K(x'|x)\) denote the probability of outputting \(x'\) given a start point \(x\). We start by writing an expression for \(K\), based on the extra simplifying assumption that \(q(x'|x) = 0\).
Suppose first \(x' \neq x\), using the law of total probability, we have
\[ \begin{align*} K(x'|x) = {\mathbb{P}}(X' = x' | X = x) &= \sum_{x''} {\mathbb{P}}(X' = x', X'' = x'' | X = x) \\ &= \sum_{x''} \underbrace{{\mathbb{P}}(X'' = x'' | X = x)}_{q(x'' | x)} \underbrace{{\mathbb{P}}(X' = x' | X = x, X''= x'')}_{{{\bf 1}}[x'' = x'] {{\bf 1}}[x' \in A]} \\ &= q(x' | x) {{\bf 1}}[x' \in A] \\ \end{align*} \]
Next, we have:
\[ \begin{align*} K(x|x) &= \sum_{x''} {\mathbb{P}}(X' = x, X'' = x'' | X = x) \\ &= \sum_{x''} \underbrace{{\mathbb{P}}(X'' = x'' | X = x)}_{q(x'' | x)} \underbrace{{\mathbb{P}}(X' = x | X = x, X''= x'')}_{{{\bf 1}}[x'' \notin A]}. \end{align*} \]
Again we start with the case \(x \neq x'\).
\[ \begin{align*} \pi(x) K(x' | x) &= \frac{{{\bf 1}}[x\in A]}{|A|} q(x' | x) {{\bf 1}}[x' \in A] \\ &= \frac{{{\bf 1}}[x'\in A]}{|A|} q(x | x') {{\bf 1}}[x \in A] \\ &= \pi(x') K(x | x') \end{align*} \]
For \(x = x'\), detailed balance automatically holds.
We start by showing that the first “move”, given by
\[ K_1(x',z'|x,z) = {{\bf 1}}[x = x'] q(z'|x) \] is \(\tilde \pi\)-invariant, by showing it satisfies detailed balance:
\[ \begin{align*} \tilde \pi(x,z) K_1(x',z'|x,z) &= \pi(x) q(z | x) {{\bf 1}}[x = x'] q(z'|x) \\ &= \pi(x') q(z | x') {{\bf 1}}[x = x'] q(z'|x') \\ &= \pi(x') q(z'|x') {{\bf 1}}[x' = x] q(z | x') \\ &= \tilde \pi(x',z') K_1(x,z|x',z'). \end{align*} \]
As for \(K_2\), since it is a proposal based on a symmetric proposal, \(q(x',z'|x,z) = {{\bf 1}}[x'=z] {{\bf 1}}[z'=x]\), we already have from a result shown in class that \(K_2\) is \(\tilde \pi\)-invariant.
Finally, since MH is a deterministic alternation of \(K_1\) and \(K_2\), we obtain that MH is also \(\tilde \pi\)-invariant.