Normalization constant estimation via stepping stone

Alexandre Bouchard-Côté

Normalization constant

Goal computing

\[Z = \int \gamma(x) {\text{d}}x\]

Main motivation in Bayesian statistics comparing models via Bayes factors, i.e. ratios of marginal likelihood.

Recall from T8-model-selection-basics.html:

Note in contrast to say computing say a posterior mean, a marginal likelihood is not an expectation, in the sense that \(\ux x\) in not a probability distribution in general. This is why it requires its own set of computational methods.



\[\gamma(x) = |\sin(x) e^{-x^2}| {{\bf 1}}[x > 0]\]

We want to approximate

\[Z = \int \gamma(x) {\text{d}}x\]

Poll: which of these converge to \(Z\) (a.s.)

  1. \((1/n) \sum_i |\sin(X_i) e^{-X_i^2}|/e^{-X_i}\) for \(X_i \sim \text{Exp}(1)\)
  2. \((1/n) \sum_i |\sin(X_i) e^{-X_i^2}|/(2 e^{-2 X_i})\) for \(X_i \sim \text{Exp}(2)\)
  3. \((1/n) \sum_i |\sin(X_i) e^{-X_i^2}| {{\bf 1}}[X_i > 0] \sqrt{2\pi}/e^{-X_i^2/2}\) for \(X_i \sim \text{Normal}(0, 1)\)
  4. All of the above
  5. None of the above

Background: importance sampling

We want to approximate

\[Z = \int \gamma(x) {\text{d}}x\]

All of the choices follow the following structure: \(X_i \sim q\), and

\[ \begin{align*} Z = \int \gamma(x) {\text{d}}x &= \int \gamma(x) \frac{q(x)}{q(x)} {\text{d}}x \\ &= \int \frac{\gamma(x)}{q(x)} q(x) {\text{d}}x \\ &= {\mathbf{E}}\left[ \frac{\gamma(X_1)}{q(X_1)} \right] \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{q(X_i)} \end{align*} \]

This is just importance sampling!

Note for this to work, \(q\) is assumed to be properly normalized.


Naive method to compute normalization constants with importance sampling

We want to approximate

\[Z = \int \gamma(x) {\text{d}}x\]

\[ \frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{q(X_i)} = \frac{1}{n} \sum_{i=1}^n \frac{{\text{prior}}(X_i) {\text{likelihood}}(X_i)}{{\text{prior}}(X_i)} = \frac{1}{n} \sum_{i=1}^n {\text{likelihood}}(X_i) \]

Stepping stone: a better way to use importance sampling for normalization constants

Stepping stone: some details


\[Z = Z_N = \frac{Z_N}{Z_{N-1}} \frac{Z_{N-1}}{Z_{N-2}} \dots \frac{Z_{1}}{Z_{0}} Z_0\]

Estimating \(Z_{i+1}/Z_i\) based on importance sampling:

\[ \begin{align*} \frac{Z_{i+1}}{Z_i} = \frac{1}{C} Z = \int \gamma(x) {\text{d}}x &= \frac{1}{C} \int \gamma(x) \frac{\tilde q(x)}{\tilde q(x)} {\text{d}}x \\ &= \frac{1}{C} \int \frac{\gamma(x)}{\tilde q(x)} \tilde q(x) {\text{d}}x \\ &= \int \frac{\gamma(x)}{\tilde q(x)} q(x) {\text{d}}x \\ &= {\mathbf{E}}\left[ \frac{\gamma(X_1)}{\tilde q(X_1)} \right] \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{\tilde q(X_i)} \end{align*} \]

For more information, see:

This is the default method used in Blang, which you can find in Plots > monitoringPlots > logNormalizationContantProgress.pdf, as we have seen earlier in the course, see e.g. Model_selection.html