Recap: hierarchical model


In our case: just make \(\mu\) and \(s\) random! (or equivalently, \(\alpha\) and \(\beta\))

Recap: sensitivity heuristic

It seems we have introduced new problems as now we again have hyperparameters, namely those for the priors on \(\mu\) and \(s\). Here we picked \(\mu \sim {\text{Beta}}(1,1) = {\text{Unif}}(0, 1)\), \(s \sim \text{Exponential}(1/10000)\)

Key point: yes, but now we are less sensitive to these choices!

Why? Heuristic: say you have a random variable connected to some hyper-parameters (grey squares) and random variables connected to data (circles)

Before going hierarchical: for maiden/early flights we had


After going hierarchical:


JAGS under the hood

Key technique used by JAGS: slice sampling.

Why should we look under the hood? To understand the failure modes and how to address them.

Things that the slice sampler has problem with (failure modes):

Solutions to these problems:

First failure mode: unidentifiability

Recall our unidentifiable model:

model Unidentifiable {
  laws {
    p ~ ContinuousUniform(0, 1)
    p2 ~ ContinuousUniform(0, 1)
    nFails | p, p2, nLaunches ~ Binomial(nLaunches, p * p2)

It gave us the following posterior:


MCMC: failure modes

Unidentifiability creates computational difficulties.


  model {
    p1 ~ dunif(0, 1)
    p2 ~ dunif(0, 1)
    x ~ dbin(p1 * p2, n)

Second failure mode: “multimodality”

Prerequisite to understand slice sampling and MH

Question: how to design a sampler for a uniform distribution on a set \(A\), assuming only pointwise evaluation?

Recall: pointwise evaluation means all you can do is: “given \(x\), answer if \(x \in A\)”.

Example: \(A = \{0, 1, 2\}\)

Idea: use a random walk so that it scales to complicated/high dimensional \(A\)

Idea 1 (incorrect)

Move to one neighbor at random.

Quick overview of (finite) Markov chain theory

Understanding JAGS

Recall: JAGS code and graphical model

slope ~ dnorm(0, 0.001)
intercept ~ dnorm(0, 0.001)

for (i in 1:length(temperature)) {
  p[i] <- ilogit(intercept + slope * temperature[i]) # recall: ilogit = logistic
  F[i] ~ dbern(p[i])


Decision tree


Note: make sure you understand how the bivariate posterior density plot matches up with the decision tree (what is x axis? y axis)

Computing the posterior

There are (at least) two ways to get to the posterior:

Recall: Bayes rule

We are able to compute \(\gamma(x)\) for any given path, but the problem is that computing \(Z\) is hard.

What JAGS/MCMC allows: computing the posterior when you only know \(\gamma\), not \(Z\).

Quick recap: computing \(\gamma\)

Let’s start with a review of how to compute \(\gamma(x)\) for a given path. I want to convince you that JAGS can do that automatically from the model you wrote.


Example on board:

JAGS/slice sampling for computing the posterior when you only know \(\gamma\), not \(Z\)

Output of JAGS/MCMC: a list of samples \(S = (X_1, X_2, \dots, X_n)\) from which we can answer all the questions we care about.

Intuition: JAGS performs a random walk under the density


Algorithm: how JAGS computes \(S\):

Note: because of “rejections” in the last step, \(S\) will actually contain duplicates, so that’s why I have started to use the more precise terminology of a “list” of samples rather than a “set”

More details

Now let us see how JAGS proposes path changes, and then how to make the accept-reject decision


For more on slice sampling (in particular, how to avoid to be sensitive with respect to the size of the window size): see the original slice sampling paper by R. Neal

Relation with Metropolis-Hastings (MH)

Case 1: proposing to go “uphill”, \(\gamma(x^*) \ge \gamma(x)\). In this case, we always accept! See the picture:


Case 2: proposing to go “downhill”, \(\gamma(x^*) < \gamma(x)\). In this case, we may accept or reject… See the picture:


We can compute the probability that we accept! It’s the height of the green stick in bold divided by the height of the black stick in bold:


This quantity is called the MH ratio

Generalization: non-uniform proposal

\[\min\left\{1, \frac{\gamma(x^*)}{\gamma(x_i)}\frac{q(x_i|x^*)}{q(x^*|x_i)}\right\}\]

Notice that we indeed get back our simpler formula \(\gamma(x^*)/\gamma(x_i)\) when \(q\) is symmetric.

First failure mode: unidentifiability


Why is this landscape hard to explore for slice sampling?

Understanding the failure


Note: other situations may cause this as well, but unidentifiability (or “weak identifiability”) is the most common one

Second failure mode: “multimodality”


Understanding the failure


Solutions to these two failure modes

Discussion on last week example: change point problem / segmentation


Change point problem

Data looks like this:


Can you build a Bayesian model for that?

How to pick distributions?

Frequently used distributions

By type:

By interpretation:

Mathematical model