Normalization constant estimation via reversible jumps

Alexandre Bouchard-Côté

Normalization constant

Goal computing

\[Z = \int \gamma(x) {\text{d}}x\]

Main motivation in Bayesian statistics comparing models via Bayes factors, i.e. ratios of marginal likelihood, see T8-model-selection-basics.html

Situation where reversible jump shines when the set of models to consider is countably infinite

Reversible jump MCMC

Two main ideas:

Reversible jump: augmented distribution

We will illustrate that augmented distribution on the “rabbit hole” example we used to introduce “model saturation”

See T8-model-selection-basics.html#(3) for a review of this example

Recall: model saturation

\[ \begin{align*} {\mathscr{Z}}' &= \{1, 2\} \times {\mathscr{Z}}_1 \times {\mathscr{Z}}_2 \\ \tilde p(\mu, z_1, z_2, x) &= p(\mu) p_1(z_1) p_2(z_2) \ell_{\mu}(x | z_{\mu}) \end{align*} \]

Reversible jump augmentation exploit the fact that each state space is a product space: for example here \({\mathscr{Z}}_i = {\mathbf{R}}^i\). Then only keep a copy of the largest space considered (here \({\mathscr{Z}}_2 = {\mathbf{R}}^2\)), plus a categorical variable \(\mu\) to keep track of which model we are currently in)

\[ \begin{align*} {\mathscr{Z}}' &= \{1, 2\} \times {\mathscr{Z}}_2 \\ \tilde p(\mu, z, x) &= p(\mu) (p_1(z_1) a(z_2) \ell_1(x | z_1))^{{{\bf 1}}[\mu = 1]} (p_2(z) \ell_2(x | z))^{{{\bf 1}}[\mu = 2]} \end{align*} \]

Here \(a\) is any auxiliary distribution you want, often taken to be the same as one of the priors (only formal requirement is that you should be able to sample iid from it)

Note: we recover the two models as follows:

\[ \begin{align*} \tilde p(z_1, x | \mu = 1) &= p_1(z_1) \ell_1(x | z_1) \\ \tilde p(z, x | \mu = 2) &= p_2(z_1, z_2) \ell_2(x | z) \end{align*} \]

Generalization to any finite set of models is simple. What about an infinite set of models?

Infinite set of models

Suppose now we have an infinite set of models with state spaces, \({\mathscr{Z}}_1, {\mathscr{Z}}_2, {\mathscr{Z}}_3, \dots\), where again each model’s state space is a product space, for example \({\mathscr{Z}}_i = {\mathbf{R}}^i\).

Do you see a problem?

Recall: in the construction we just did, “keep a copy of the largest space considered […] plus a categorical variable \(\mu\) to keep track of which model we are currently in”

MCMC over an infinite space

Key idea “lazy” representation of infinite dimensional objects. We only keep in memory the prefix of \(z\) that we need at current iteration. Use iid draws from \(a\) to fill in missing parts when we need more.

Note the same idea of “lazy” representation of infinite dimensional objects is also key to many inference algorithms used in the Bayesian non-parametric literature.

Second idea: better proposals

Reversible jump algorithm

Dimensionality matching: a necessary conditions for the mapping to be diffeomorphic is that the input dimensionality of \(\Psi\) should match the output dimensionality of \(\Psi\).

Consequence: let us say that we want to “jump” from a model with \(m_1\) dimensions into one with \(m_2\) dimensions. What constraints do we have on the number \(n_1\) of auxiliary variables we add to the first model, and the number \(n_2\) we add to the second?

Notation:

Ratio for RJMCMC:

\[ \frac{p(i')\pi_{i'}(x')}{p(i)\pi_i(x)} \frac{\rho_{i'\to i}}{\rho_{i\to i'}} \frac{g_{i'}(u_{i'})}{g_{i}(u_{i})} \left| J(x', u_2) \right| \]

Example: textbook, page 365.