# Doubly intractable problems

• Recall we assume with MCMC that we can compute $$\gamma$$ pointwise
• What if we cannot even compute $$\gamma$$ pointwise?
• Othen this comes up when the likelihood factor is intractable
• This setup is called “doubly intractable problems”
• Example: the generative model is a black box that can simulate datasets but we do not know how it works internally

# Approximate Bayesian Computation (ABC)

• One approach to doubly intractable problems
• Intuitive:
• approximate the likelihood by generating synthetic datasets (often easier than computing the likelihood),
• the closer the generated data to the actual data, the higher the estimate of the likelihood
• Simplest example (ABC 1): say there are only two possible parameter values, $$\theta_1$$ and $$\theta_2$$, discrete data $$y$$
• Simulate $$y_{1,1}, y_{1,2}, \dots, y_{1,N} \sim p(\cdot | \theta_1)$$
• Compute $$\hat p(y | \theta_1) = \frac{1}{N} \sum_i {{\bf 1}}[y_{1,i} = y] \to p(y|\theta_1)$$
• Then $$y_{2,1}, y_{2,2}, \dots, y_{2,N} \sim p(\cdot | \theta_2)$$
• Compute $$\hat p(y | \theta_2) = \frac{1}{N} \sum_i {{\bf 1}}[y_{2,i} = y] \to p(y|\theta_2)$$
• Second example (ABC 2): what if $$y$$ is continuous/combinatorial?
• Assume we have a distance measure on datasets $$y$$, $$d(y, y')$$ (often based on summary statistics)
• Let $$\epsilon > 0$$ denote a tolerance
• Use the following estimator instead: $$\hat p(y | \theta) = \frac{1}{N} \sum_i {{\bf 1}}[d(y_{1,i}, y) < \epsilon] \to p(\{y' : d(y,y') < \epsilon\}|\theta)$$
• Note: RHS is not equal to $$p(y|\theta)$$, hence the “Approximate” in ABC

### Poll: which algorithms can get arbitrarily close to the true posterior by increasing computational budget

1. ABC 1, ABC 2, and MCMC
2. ABC 1, and ABC 2
3. ABC 1 and MCMC
4. ABC 1 only
5. ABC 2 only

# Key difference between ABC and MCMC

• With MCMC, we can always get arbitrarily close to the true posterior by increasing computational budget $$K$$ (number of iterations)
• This situation is called an “exact approximation
• With ABC, we cannot always get arbitrarily close to the true posterior by increasing computational budget $$N$$ (number of generated datasets; or even letting $$\epsilon \to 0$$ because of $$d(\cdot, \cdot)$$ not capturing full information on the likelihood in general)
• For ABC 1, we can, for ABC 2, we cannot
• Recent work is looking at the impact of this residual error on the calibration properties of the approximation. See for example this review
• Idea: ABC is an exact approximation for a perturbed target, $\pi_{\epsilon,\text{ABC}}(\theta) \propto \text{prior}(\theta) \int \text{likelihood}(y'|\theta) {{\bf 1}}[d(y,y') < \epsilon] {\text{d}}y'$

# State of the art: automatic shrinked tuning of $$\epsilon$$

• The algorithm from first slide seems hopelessly impractical; there are way better alternatives available!
• Sequential change of measure (details will be easier to follow once we hopefully cover SMC in 2 weeks)
• Uses an adaptive SMC scheme to pick the sequence $$\epsilon_n$$, see Del Moral, Doucet, Jasra