Recall we assume with MCMC that we can compute γ pointwise
What if we cannot even compute γ pointwise?
Othen this comes up when the likelihood factor is intractable
This setup is called “doubly intractable problems”
Example: the generative model is a black box that can simulate datasets but we do not know how it works internally
Approximate Bayesian Computation (ABC)
One approach to doubly intractable problems
Intuitive:
approximate the likelihood by generating synthetic datasets (often easier than computing the likelihood),
the closer the generated data to the actual data, the higher the estimate of the likelihood
Simplest example (ABC 1): say there are only two possible parameter values, θ1 and θ2, discrete data y
Simulate y1,1,y1,2,…,y1,N∼p(⋅|θ1)
Compute ˆp(y|θ1)=1N∑i1[y1,i=y]→p(y|θ1)
Then y2,1,y2,2,…,y2,N∼p(⋅|θ2)
Compute ˆp(y|θ2)=1N∑i1[y2,i=y]→p(y|θ2)
Second example (ABC 2): what if y is continuous/combinatorial?
Assume we have a distance measure on datasets y, d(y,y′) (often based on summary statistics)
Let ϵ>0 denote a tolerance
Use the following estimator instead: ˆp(y|θ)=1N∑i1[d(y1,i,y)<ϵ]→p({y′:d(y,y′)<ϵ}|θ)
Note: RHS is not equal to p(y|θ), hence the “Approximate” in ABC
Poll: which algorithms can get arbitrarily close to the true posterior by increasing computational budget
ABC 1, ABC 2, and MCMC
ABC 1, and ABC 2
ABC 1 and MCMC
ABC 1 only
ABC 2 only
Key difference between ABC and MCMC
With MCMC, we can always get arbitrarily close to the true posterior by increasing computational budget K (number of iterations)
This situation is called an “exact approximation”
With ABC, we cannot always get arbitrarily close to the true posterior by increasing computational budget N (number of generated datasets; or even letting ϵ→0 because of d(⋅,⋅) not capturing full information on the likelihood in general)
For ABC 1, we can, for ABC 2, we cannot
Recent work is looking at the impact of this residual error on the calibration properties of the approximation. See for example this review
Idea: ABC is an exact approximation for a perturbed target, πϵ,ABC(θ)∝prior(θ)∫likelihood(y′|θ)1[d(y,y′)<ϵ]dy′
State of the art: automatic shrinked tuning of ϵ
The algorithm from first slide seems hopelessly impractical; there are way better alternatives available!
Sequential change of measure (details will be easier to follow once we hopefully cover SMC in 2 weeks)