STAT 520 - Bayesian Analysis

Alexandre Bouchard-Côté

3/20/2019

Goals

Today:

Recap: sampling on uniform distribution using propose + accept reject

Recap: modelling the randomness of MCMC

Recap: Occupancy vector and transition matrix

Recap: Stationarity

drawing

Recap: (finite) Markov chain theory

Recap: slice sampling

Now let us see how JAGS proposes path changes, and then how to make the accept-reject decision

drawing

For more on slice sampling (in particular, how to avoid to be sensitive with respect to the size of the window size): see the original slice sampling paper by R. Neal

Recap: relation with Metropolis-Hastings (MH)

Case 1: proposing to go “uphill”, \(\gamma(x^*) \ge \gamma(x)\). In this case, we always accept! See the picture:

drawing

Case 2: proposing to go “downhill”, \(\gamma(x^*) < \gamma(x)\). In this case, we may accept or reject… See the picture:

drawing

We can compute the probability that we accept! It’s the height of the green stick in bold divided by the height of the black stick in bold:

\[\frac{\gamma(x^*)}{\gamma(x_i)}\]

This quantity is called the MH ratio

Generalization: non-uniform proposal

\[\min\left\{1, \frac{\gamma(x^*)}{\gamma(x_i)}\frac{q(x_i|x^*)}{q(x^*|x_i)}\right\}\]

Notice that we indeed get back our simpler formula \(\gamma(x^*)/\gamma(x_i)\) when \(q\) is symmetric.

First failure mode: unidentifiability

drawing

Why is this landscape hard to explore for slice sampling?

Understanding the failure

drawing

Note: other situations may cause this as well, but unidentifiability (or “weak identifiability”) is the most common one

Second failure mode: “multimodality”

drawing

Understanding the failure

drawing

Solutions to these two failure modes

Parallel tempering: annealing

drawing

Combination of two ideas: “annealing” and “parallel chains”

\[\pi_{\beta}(x) = (\pi(x))^{\beta}\]

where \(\beta\) controls an interpolation between:

Let us call \(\beta\) the annealing parameter

Now we have several different “landscapes”, from smooth to rough. We will explore all of them simultaneously!

Parallel tempering: swaps

drawing

Non-reversible parallel tempering

Parallel Tempering: Reversible (top) versus non-reversible (bottom)

drawing

Blang: a PPL implementation of non-reversible parallel tempering

Hamiltonian Monte Carlo

Example: unidentifiable model coded up in Stan

Discussion on last week example: change point problem / segmentation

drawing

Change point problem

Data looks like this:

drawing

Can you build a Bayesian model for that?

How to pick distributions?

Frequently used distributions

By type:

By interpretation:

Mathematical model