Bayes

Tutorial on Model Selection

In this tutorial, we will look at how model saturation can be used to compute Bayes factors.

We start with a model assuming no preference

ex.hole.First
package ex.hole model First { random List<IntVar> xs ?: fixedIntList(12, 32) random RealVar alpha ?: latentReal laws { alpha ~ Exponential(0.1) for (IntVar x : xs) { x | alpha ~ Poisson(alpha) } } }

Now look at the standard output, in the last round, take note of "logNormalizationContantProgress [ round=11 value=___ ]"

This is an estimate of the natural logarithm of the probability of the data (evidence or marginal likelihood). This estimate is based on the stepping stone method combined with parallel tempering, methods that we will cover in detail in a few weeks.

ex.hole.Second
package ex.hole model Second { random List<IntVar> xs ?: fixedIntList(12, 32) random List<RealVar> betas ?: latentRealList(2) laws { for (int i : 0..1) { betas.get(i) ~ Exponential(0.1) xs.get(i) | RealVar beta = betas.get(i) ~ Poisson(beta) } } }

Take note of the estimate of the marginal likelihood for the second model.

Consider now an augmented model based on model saturation:

ex.hole.Both
package ex.hole model Both { // model selection random IntVar indicator ?: latentInt // first model's latent variables random RealVar alpha ?: latentReal // second model's latent variables random List<RealVar> betas ?: latentRealList(2) // data random List<IntVar> xs ?: fixedIntList(12, 32) laws { indicator ~ Bernoulli(0.5) // priors for first model alpha ~ Exponential(0.1) // priors for second model for (int i : 0..1) { betas.get(i) ~ Exponential(0.1) } // likelihood for (int i : 0..1) { xs.get(i) | RealVar beta = betas.get(i), alpha, indicator ~ Poisson(if (indicator == 0) alpha else beta) } } }

Exercise 1

Compare the approximation based on model saturation versus those based on separate stepping stone estimates.

Exercise 2

Suppose now the dataset is $$x_1 = 12, x_2 = 13$$. Clearly, a reasonable model selection should then give more weight to the simpler model, i.e. it should penalize the additional complexity of the 2-parameter model. Do you observe this?