Approximate Bayesian Computation
Alexandre Bouchard-Côté
Doubly intractable problems
- Recall we assume with MCMC that we can compute \(\gamma\) pointwise
- What if we cannot even compute \(\gamma\) pointwise?
- Othen this comes up when the likelihood factor is intractable
- This setup is called “doubly intractable problems”
- Example: the generative model is a black box that can simulate datasets but we do not know how it works internally
Approximate Bayesian Computation (ABC)
- One approach to doubly intractable problems
- Intuitive:
- approximate the likelihood by generating synthetic datasets (often easier than computing the likelihood),
- the closer the generated data to the actual data, the higher the estimate of the likelihood
- Simplest example (ABC 1): say there are only two possible parameter values, \(\theta_1\) and \(\theta_2\), discrete data \(y\)
- Simulate \(y_{1,1}, y_{1,2}, \dots, y_{1,N} \sim p(\cdot | \theta_1)\)
- Compute \(\hat p(y | \theta_1) = \frac{1}{N} \sum_i {{\bf 1}}[y_{1,i} = y] \to p(y|\theta_1)\)
- Then \(y_{2,1}, y_{2,2}, \dots, y_{2,N} \sim p(\cdot | \theta_2)\)
- Compute \(\hat p(y | \theta_2) = \frac{1}{N} \sum_i {{\bf 1}}[y_{2,i} = y] \to p(y|\theta_2)\)
- Second example (ABC 2): what if \(y\) is continuous/combinatorial?
- Assume we have a distance measure on datasets \(y\), \(d(y, y')\) (often based on summary statistics)
- Let \(\epsilon > 0\) denote a tolerance
- Use the following estimator instead: \(\hat p(y | \theta) = \frac{1}{N} \sum_i {{\bf 1}}[d(y_{1,i}, y) < \epsilon] \to p(\{y' : d(y,y') < \epsilon\}|\theta)\)
- Note: RHS is not equal to \(p(y|\theta)\), hence the “Approximate” in ABC
Poll: which algorithms can get arbitrarily close to the true posterior by increasing computational budget
- ABC 1, ABC 2, and MCMC
- ABC 1, and ABC 2
- ABC 1 and MCMC
- ABC 1 only
- ABC 2 only
Key difference between ABC and MCMC
- With MCMC, we can always get arbitrarily close to the true posterior by increasing computational budget \(K\) (number of iterations)
- This situation is called an “exact approximation”
- With ABC, we cannot always get arbitrarily close to the true posterior by increasing computational budget \(N\) (number of generated datasets; or even letting \(\epsilon \to 0\) because of \(d(\cdot, \cdot)\) not capturing full information on the likelihood in general)
- For ABC 1, we can, for ABC 2, we cannot
- Recent work is looking at the impact of this residual error on the calibration properties of the approximation. See for example this review
- Idea: ABC is an exact approximation for a perturbed target, \[\pi_{\epsilon,\text{ABC}}(\theta) \propto \text{prior}(\theta) \int \text{likelihood}(y'|\theta) {{\bf 1}}[d(y,y') < \epsilon] {\text{d}}y'\]
State of the art: automatic shrinked tuning of \(\epsilon\)
- The algorithm from first slide seems hopelessly impractical; there are way better alternatives available!
- Sequential change of measure (details will be easier to follow once we hopefully cover SMC in 2 weeks)
- Uses an adaptive SMC scheme to pick the sequence \(\epsilon_n\), see Del Moral, Doucet, Jasra