Alexandre Bouchard-Côté
Exercise debrief
There is a way to make this idea even more powerful in the context of MCMC (see reading “Getting it right” and Cook et al.)
Recall: \({{\bf 1}}[\cdots]\) denotes the indicator on the event \([\cdot]\), i.e. a function that is 1 if the input outcome \(s\) is in the event, 0 otherwise.
In the last question, we approximated the probability above by averaging over datasets
If you don’t have many datasets, you can average over \(i\) (this is called “leave one out” LOO) but this is more subtle as you loose iid (so the properties of the goodness of fit procedure will be more dependent on the model), this leads to \[{\mathbb{P}}(Y_i \in C_\alpha(Y_{-i})) \approx \frac{1}{n} \sum_i {{\bf 1}}[y_{i} \in C_\alpha(y_{-i})]\]
What to do if your model is misspecified?
One obvious possibility is to make the model better. But in the following we show that under certain conditions, another possibility is to collect more data.
Now suppose we are using the “wrong prior”, i.e. data generation uses uniform prior but we base or posterior computation on the non-uniform prior from earlier
Again, let’s do it for small (10 launches), medium (100), and large datasets (1000), plotting the nominal coverage (dashed) against the actual coverage (solid line)
# Using now the same non-uniform prior as before for posterior calculation
K <- 1000
rdunif <- function(max) { return(ceiling(max*runif(1))) }
posterior_distribution <- function(prior_probs, n_successes, n_trials){
K <- length(prior_probs)-1 # K+1 values that your p can assume
n_fails <- n_trials - n_successes
p <- seq(0, 1, 1/K)
posterior_probs <- # 1. this computes gamma(i)
prior_probs * # - prior
p^n_successes * (1-p)^n_fails # - likelihood
posterior_probs <- posterior_probs/sum(posterior_probs) # 2. normalize gamma(i)
post_prob <- rbind(p, posterior_probs)
return(post_prob)
}
high_density_intervals <- function(alpha, posterior_probs){
ordered_probs = posterior_probs[,order(posterior_probs[2,], decreasing = TRUE)]
cumulative_probs = cumsum(ordered_probs[2,])
index = which.max(cumulative_probs >= (1-alpha))
return(ordered_probs[,1:index, drop=FALSE])
}
hdi_coverage_pr <- function(n_datapoints) {
n_inclusions <- 0
for (repetition in seq(1:n_repeats)) {
i <- rdunif(K + 1) - 1 # Always generate the data using a uniform prior
true_p <- i/K
x <- rbinom(1, n_datapoints, true_p)
post <- posterior_distribution(prior_used_for_computing_posterior, x, n_datapoints)
# This if is just a hacky way to check if true parameter is in the HDI credible interval
if (sum(abs(true_p - high_density_intervals(alpha, post)[1,]) < 10e-10) == 1) {
n_inclusions <- n_inclusions + 1
}
}
return(n_inclusions/n_repeats) # Fraction of simulation where the true parameter was in interval
}
prior_used_for_computing_posterior <- dnorm(seq(0, 1, 1/K),mean = 0.2, sd=0.2)
prior_used_for_computing_posterior <- prior_used_for_computing_posterior / sum(prior_used_for_computing_posterior)
set.seed(1)
n_repeats <- 1000
alpha <- 0.1
df <- data.frame("n_observations" = c(10, 100, 1000))
df$coverage_pr <- sapply(df$n_observations, hdi_coverage_pr)
ggplot(data=df, aes(x=n_observations, y=coverage_pr)) +
ylim(0, 1) +
geom_hline(yintercept=1-alpha, linetype="dashed", color = "black") +
geom_line()
Bernstein-von Mises: under certain conditions, even when the prior is misspecified the actual coverage of credible intervals converges to the nominal coverage. (converge just means “gets arbitrarily close”)
Conditions include, but are not limited to:
Practical consequence: