Metropolis-Hastings and slice samplers

Alexandre Bouchard-Côté

Monte Carlo methods

\[\frac{1}{K} \sum_{k=1}^K f(X_k) \approx \int f(x) \pi(x) {\text{d}}x\]

Most common setup in Bayesian statistics

Extracting the essential ingredients of the “common setup”

Outline

Prerequisite: a bit of Markov chain theory

Question: how to design a sampler for a uniform distribution on a set \(A\)?

Example: \(A = \{0, 1, 2\}\)

Idea: use a random walk so that it scales to complicated/high dimensional \(A\)

Poll: what random walk has a uniform stationary distribution on \(A\)

  1. Move to a neighbor at random
  2. Move to a neighbor at random, except at the boundary, where you stop moving
  3. Move to a neighbor at random, including yourself (self-transition)
  4. None of these walks would work

Markov chain 1 (incorrect)

Move to one neighbor at random?

How to check if this is correct:

  1. Naive method: simulate a long chain, compute a histogram from the list of states visited
  2. Exercise: explain why the code below is equivalent to the “Naive method”
compute_stationary_distribution <- function(matrix) {
  eig <- eigen(t(M))
  eigenvalues = eig$values
  i <- which.max(Re(eigenvalues[abs(Im(eigenvalues)) < 1e-6]))
  statio <- (eig$vectors)[,i]
  statio <- statio / sum(statio)
  return(statio)
}

M = matrix(c(
  0,   1, 0,
  1/2, 0, 1/2,
  0,   1, 0
), nrow=3, ncol=3, byrow=T)

barplot(compute_stationary_distribution(M))

Markov chain 2 (incorrect)

Move to a neighbor at random, except at the boundary, where you stop moving.

No: you would get a random stationary distribution (either a Dirac delta at state 1 or a Dirac delta at state 3)

Markov chain 3 (incorrect)

Move to a neighbor at random, including a self-transition

M = matrix(c(
  1/2,   1/2, 0,
  1/4, 1/2, 1/4,
  0,   1/2, 1/2
), nrow=3, ncol=3, byrow=T)

barplot(compute_stationary_distribution(M))

Markov chain 4: propose + accept reject

M = matrix(c(
  1/2,   1/2, 0,
  1/2,   0,   1/2,
  0,     1/2, 1/2
), nrow=3, ncol=3, byrow=T)

barplot(compute_stationary_distribution(M))

Understanding why idea/walk 1 fails while idea/walk 4 works

Occupancy vector and transition matrix

Stationarity

drawing

drawing

Quick overview of (finite) Markov chain theory

Generalization

Slice sampling

drawing

Slice sampling: some details

drawing

For more on slice sampling (in particular, how to avoid to be sensitive with respect to the size of the window size): see the original slice sampling paper by R. Neal

Relation with Metropolis-Hastings (MH)

Case 1: proposing to go “uphill”, \(\gamma(x^*) \ge \gamma(x)\). In this case, we always accept! See the picture:

drawing

Case 2: proposing to go “downhill”, \(\gamma(x^*) < \gamma(x)\). In this case, we may accept or reject… See the picture:

drawing

Poll: what is the probability that we will accept in case 2

  1. \(\gamma(x_i)/\gamma(x^*)\)
  2. \(\gamma(x^*)/\gamma(x_i)\)
  3. \(\gamma(x^*) + \gamma(x_i)\)
  4. \(\gamma(x^*) \cdot \gamma(x_i)\)
  5. None of the above

Acceptance probability

We can compute the probability that we accept! It’s the height of the green stick in bold divided by the height of the black stick in bold:

drawing

\[\frac{\gamma(x^*)}{\gamma(x_i)}\]

This quantity is called the Metropolis ratio

A special case of Metropolis algorithm

We obtain the following algorithm. Suppose the current state is \(x_i\):

  1. Propose a new position, \(x^*\), uniformly in a ball around \(x_i\)
  2. Compute the Metropolis ratio \[r = \frac{\gamma(x^*)}{\gamma(x_i)}\]
  3. Simulate an acceptance variable \[a \sim \text{Bern}(\min\{1, r\})\]
    • If \(a = 1\): set \(x_{i+1} = x^*\)
    • Else, set \(x_{i+1} = x_i\)

General Metropolis algorithm

More general proposals:

Combinations of proposals:

Another generalization: Metropolis-Hastings