# Normalization constant

Goal computing

$Z = \int \gamma(x) {\text{d}}x$

Main motivation in Bayesian statistics comparing models via Bayes factors, i.e. ratios of marginal likelihood.

Recall from T8-model-selection-basics.html:

• Let $$\gamma(x) = {\text{prior}}(x) {\text{likelihood}}(x)$$
• Computing $$Z$$ for a first model.
• Computing $$Z'$$ for a second model.
• Taking a ratio $$Z/Z'$$ gives us the Bayes factor.

Note in contrast to say computing say a posterior mean, a marginal likelihood is not an expectation, in the sense that $$\ux x$$ in not a probability distribution in general. This is why it requires its own set of computational methods.

# Background

Suppose

$\gamma(x) = |\sin(x) e^{-x^2}| {{\bf 1}}[x > 0]$

We want to approximate

$Z = \int \gamma(x) {\text{d}}x$

### Poll: which of these converge to $$Z$$ (a.s.)

1. $$(1/n) \sum_i |\sin(X_i) e^{-X_i^2}|/e^{-X_i}$$ for $$X_i \sim \text{Exp}(1)$$
2. $$(1/n) \sum_i |\sin(X_i) e^{-X_i^2}|/(2 e^{-2 X_i})$$ for $$X_i \sim \text{Exp}(2)$$
3. $$(1/n) \sum_i |\sin(X_i) e^{-X_i^2}| {{\bf 1}}[X_i > 0] \sqrt{2\pi}/e^{-X_i^2/2}$$ for $$X_i \sim \text{Normal}(0, 1)$$
4. All of the above
5. None of the above

# Background: importance sampling

We want to approximate

$Z = \int \gamma(x) {\text{d}}x$

All of the choices follow the following structure: $$X_i \sim q$$, and

\begin{align*} Z = \int \gamma(x) {\text{d}}x &= \int \gamma(x) \frac{q(x)}{q(x)} {\text{d}}x \\ &= \int \frac{\gamma(x)}{q(x)} q(x) {\text{d}}x \\ &= {\mathbf{E}}\left[ \frac{\gamma(X_1)}{q(X_1)} \right] \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{q(X_i)} \end{align*}

This is just importance sampling!

Note for this to work, $$q$$ is assumed to be properly normalized.

Next

• First, a naive way to approximate normalization constants using importance sampling
• Why it is broken
• How to fix it

# Naive method to compute normalization constants with importance sampling

We want to approximate

$Z = \int \gamma(x) {\text{d}}x$

• In the Bayesian context $$\gamma(x) = {\text{prior}}(x) {\text{likelihood}}(x)$$
• First idea: use $$q(x) = {\text{prior}}(x)$$, get the estimator:

$\frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{q(X_i)} = \frac{1}{n} \sum_{i=1}^n \frac{{\text{prior}}(X_i) {\text{likelihood}}(X_i)}{{\text{prior}}(X_i)} = \frac{1}{n} \sum_{i=1}^n {\text{likelihood}}(X_i)$

• This idea works very poorly in practice!
• Reason: “a sample of size approximately $$\exp(\text{KL}(\gamma\|q))$$ is necessary and sufficient for accurate estimation by importance sampling” see https://arxiv.org/abs/1511.01437
• $$\text{KL}(\gamma\|q)$$ is the KL divergence
• A similar idea (also bad) is to use the posterior distribution $$\pi$$ for $$q$$, this yields the so called “harmonic mean estimator”, also known as the “Worst Monte Carlo Method Ever”

# Stepping stone: a better way to use importance sampling for normalization constants

• Suppose we have not one joint distribution but instead a sequence $$\gamma_0, \gamma_1, \gamma_2, \dots, \gamma_N$$
• Let $$Z_i = \int \gamma_i(x) {\text{d}}x$$
• Suppose $$Z_0$$ is known (e.g. for prior distribution, the normalization constants are often known)
• We are interested in $$Z_N$$
• $$\pi_i = \gamma_i / Z_i$$
• Example: $$\pi_i$$ obtained by annealing between the prior ($$\pi_0$$) and posterior ($$\pi_N$$), as in the parallel tempering topic
• Insight: in many practical problems, the value of $$N$$ needed for $$\text{KL}(\pi_i\|\pi_{i+1}) = O(1)$$ grows slowly (probably polynomially?)

# Stepping stone: some details

Write:

$Z = Z_N = \frac{Z_N}{Z_{N-1}} \frac{Z_{N-1}}{Z_{N-2}} \dots \frac{Z_{1}}{Z_{0}} Z_0$

• Each ratio will be estimated using an importance sampling estimator
• The last one, $$Z_0$$, is assumed to be known.

Estimating $$Z_{i+1}/Z_i$$ based on importance sampling:

• We have samples for $$\pi_{i+1}$$ and $$\pi_i$$
• Let us use $$q = \pi_i$$ as proposal (we want proposals to have tails that are not thinner than target; $$\pi_i$$ is closer to prior, hence will often have larger tails), and hence $$\gamma = \pi_{i+1}$$
• Twist: we cannot evaluate $$q$$, only an unnormalized version, $$q = \tilde q / C$$.

\begin{align*} \frac{Z_{i+1}}{Z_i} = \frac{1}{C} Z = \int \gamma(x) {\text{d}}x &= \frac{1}{C} \int \gamma(x) \frac{\tilde q(x)}{\tilde q(x)} {\text{d}}x \\ &= \frac{1}{C} \int \frac{\gamma(x)}{\tilde q(x)} \tilde q(x) {\text{d}}x \\ &= \int \frac{\gamma(x)}{\tilde q(x)} q(x) {\text{d}}x \\ &= {\mathbf{E}}\left[ \frac{\gamma(X_1)}{\tilde q(X_1)} \right] \leftarrow \frac{1}{n} \sum_{i=1}^n \frac{\gamma(X_i)}{\tilde q(X_i)} \end{align*}