STAT 520 - Bayesian Analysis

Alexandre Bouchard-Côté

5/1/2019

Today

Program

Logistics

Recall: model selection notation

\[ m_i = \int p_i(z) \ell_i(z) {\text{d}}z \]

Recall: key idea for Bayesian model selection

Put a prior \(p\) on \(I\), and make the uncertainty over models part of the probabilistic model.

The new joint probability density is given by:

\[ p((i, z), x) = p(i) p_i(z) \ell_i(x | z), \]

where \((i, z)\) is a member of a new latent space given by:

\[ {\mathscr{Z}}= \bigcup_{i\in I} \left( \{i\} \times {\mathscr{Z}}_i \right), \]

Recall: Bayes factor

Ratio of the marginal likelihood for two models:

\[ B_{12} = \frac{m_1(x)}{m_2(x)} \]

Values of \(B_{12}\) greater than 1.0 favor model #1 over #2. Values smaller than 1.0 favor #2 over #1.

Computation of Bayes factor / marginal likelihood

Computing \(m_i\) is hard (19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology)

I will outline some key methods:

Model saturation: spike and slab example

The idea is to build an augmented model, which can be written as a graphical model, and from which we can still approximate \(m_i(x)\).

Construction of the auxiliary latent space:

Variation on this theme: spike and slab via saturation

Recall: Thermodynamic integration

We start with the following identity, which holds under the assumption that we can swap a derivative and an integral (see for example Folland, Real Analysis):

\[ \begin{aligned} \log C_1 &= \int_0^1 \int \pi_t(z) \log \ell(x|z) \\ &=: \int_0^1 g(t) \end{aligned} \]

SMC: wrap-up

See files/SMC.pdf

Factoid useful soon: SMC’s estimator of the marginal likelihood is unbiased.

Pseudo-marginal methods

Setup

Twist in the pseudo-marginal context (Beaumont, 2003): we assume that the likelihood \(p(y|\theta)\) is difficult or impossible to compute pointwise, but that it can be estimated in a positive unbiased fashion.

Formally: there is a function \(T(u, y, \theta)\), where \(u\) is an auxiliary variable in an arbitrary space, with density \(m(u|y,\theta)\), such that:

For example, in the context a state space model:

Pseudo marginal methods: executive summary

Pseudo marginal methods: some details

From any positive unbiased estimator, we can now build a Metropolis-Hastings algorithms on a certain extended space. The extended target is given by: \[\begin{equation}\label{eq:extended} \tilde \pi(\theta, u) \propto p(\theta) T(u, y, \theta) m(u | y, \theta). \end{equation}\]

By Assumption (unbiased), the posterior distribution of interest is indeed a marginal of the extended target: \[ \int \tilde \pi(\theta, u) {\text{d}}u = \pi(\theta). \] Assumption (positive) and the above argument imply that the extended target is indeed a well defined density.

Now we build a Metropolis-Hastings algorithm on the extended target as follows. Let \(q(\theta'|\theta)\) denote a user-provided proposal on the parameters. We augment it to a proposal on \(u\) and \(\theta\) using \[ q(\theta', u'|\theta, u) = q(\theta'|\theta) m(u'|y, \theta'). \] The Metropolis-Hastings ratio arising from this proposal is given by:

\[ \begin{align*} \frac{\tilde \pi(\theta', u')}{\tilde \pi(\theta, u)} \frac{q(\theta, u|\theta', u')}{q(\theta', u'|\theta, u)} &= \frac{p(\theta') T(u', y, \theta') m(u' | y, \theta')}{p(\theta) T(u, y, \theta) m(u | y, \theta)} \frac{q(\theta|\theta') m(u|y, \theta)}{q(\theta'|\theta) m(u'|y, \theta')} \\ &= \frac{p(\theta') T(u', y, \theta')}{p(\theta) T(u, y, \theta)} \frac{q(\theta|\theta') }{q(\theta'|\theta)}. \end{align*} \]

From Assumption (computability), this ratio can be computed, so we have an MCMC algorithm targeting \(\tilde \pi\). By discarding the \(u\) part of the samples, we therefore get an MCMC targeting \(\pi\).

This yields the following algorithm:

Side note: Approximate Bayesian Computation (ABC)

Reversible jump

Key advantage:

Reversible jump algorithm

Dimensionality matching: a necessary conditions for the mapping to be diffeomorphic is that the input dimensionality of \(\Psi\) should match the output dimensionality of \(\Psi\).

Consequence: let us say that we want to “jump” from a model with \(m_1\) dimensions into one with \(m_2\) dimensions. What constraints do we have on the number \(n_1\) of auxiliary variables we add to the first model, and the number \(n_2\) we add to the second?

Notation:

Ratio for RJMCMC:

\[ \frac{p(i')\pi_{i'}(x')}{p(i)\pi_i(x)} \frac{\rho_{i'\to i}}{\rho_{i\to i'}} \frac{g_{i'}(u_{i'})}{g_{i}(u_{i})} \left| J(x', u_2) \right| \]

Example: textbook, page 365.

Office hour materials

Set-up

Upgrading Blang (optional)

Hierarchical model for Ariane rockets

Setting up the code

--model.data data/failure_counts.csv 
--model.rocketTypes.name LV.Type
--postProcessor DefaultPostProcessor
--postProcessor.imageFormat pdf
--engine PT 
--engine.nChains 10

High-level overview of Blang

See files/blang-high-level.pdf

Running the code

./gradlew clean
./gradlew installDist
java -Xmx2g -cp build/install/blang520Assign/lib/\\* demo.ArianeHierarchical [COMMAND LINE OPTIONS]

Some quick experimentation with this model

Mini-exercise:

Deciding on flat vs hierarchical

Idea: I run the code first with --model.useHierarchical true, then with --model.useHierarchical false, and compute the Bayes factor from that.

Interpreting the standard out messages

We will go over the meaning of the output below in the office hours.

Note: data source data/failure_counts.csv does not contain column failureprobabilities --- treating as missing
WARNING: There were files not up to date in the code repository (see /Users/bouchard/w/blangExample/results/all/2019-04-01-12-43-49-TokB5Hso.exec/executionInfo/code/dirty-files.txt)
Preprocess {
  17 samplers constructed with following prototypes:
    RealScalar sampled via: [RealSliceSampler]
  Initialization {
    Warning: small concentrations may cause numeric instability to Dirichlet and Beta distributions. Consider enforcing a lower bound of say 0.5 This message may also occur when slice samling outside of such constraint, you can then ignore this message. 
    Propagation [ annealParam=0.0 ess=0.900 ]
    Propagation [ annealParam=0.001 ess=0.736 ]
    Propagation [ annealParam=0.004 ess=0.563 ]
    Propagation [ annealParam=0.008 ess=0.410 ]
    Resampling [ iter=3 annealParam=0.014 logNormalization=-1.678 ]
    Propagation [ annealParam=0.014 ess=0.900 ]
    Propagation [ annealParam=0.022 ess=0.717 ]
    Propagation [ annealParam=0.034 ess=0.548 ]
    Propagation [ annealParam=0.051 ess=0.413 ]
    Resampling [ iter=7 annealParam=0.082 logNormalization=-4.323 ]
    Propagation [ annealParam=0.082 ess=0.958 ]
    Propagation [ annealParam=0.111 ess=0.793 ]
    Propagation [ annealParam=0.179 ess=0.677 ]
    Propagation [ annealParam=0.222 ess=0.463 ]
    Resampling [ iter=0 annealParam=0.305 logNormalization=-9.731 ]
    Propagation [ annealParam=0.305 ess=0.992 ]
    Propagation [ annealParam=0.333 ess=0.849 ]
    Propagation [ annealParam=0.434 ess=0.827 ]
    Propagation [ annealParam=0.444 ess=0.596 ]
    Propagation [ annealParam=0.556 ess=0.414 ]
    Resampling [ iter=0 annealParam=0.667 logNormalization=-17.196 ]
    Propagation [ annealParam=0.667 ess=0.900 ]
    Propagation [ annealParam=0.771 ess=0.889 ]
    Propagation [ annealParam=0.778 ess=0.707 ]
    Propagation [ annealParam=0.889 ess=0.548 ]
  } [ endingBlock=Initialization blockTime=338.4ms blockNErrors=1 ]
} [ endingBlock=Preprocess blockTime=587.1ms blockNErrors=1 ]
Inference {
  Round(1/9) {
    Performing 1220 moves... [ nScans=2 nChains=10 movesPerScan=61 ]
    swapSummaries [ round=0 lowest=0.0 average=0.375 ]
    logNormalizationContantProgress [ round=0 value=-142.950 ]
    estimatedLambda [ round=0 value=5.622 ]
    actualTemperedRestarts [ round=0 count=0 rate=0.0 ]
    asymptoticRoundTripBound [ round=0 count=0.151 rate=0.076 ]
  } [ endingBlock=Round(1/9) blockTime=71.62ms blockNErrors=0 ]
  Round(2/9) {
    Performing 2440 moves... [ nScans=4 nChains=10 movesPerScan=61 ]
    swapSummaries [ round=1 lowest=0.100 average=0.477 ]
    logNormalizationContantProgress [ round=1 value=-17.644 ]
    estimatedLambda [ round=1 value=4.710 ]
    actualTemperedRestarts [ round=1 count=0 rate=0.0 ]
    asymptoticRoundTripBound [ round=1 count=0.350 rate=0.088 ]
  } [ endingBlock=Round(2/9) blockTime=81.04ms blockNErrors=0 ]
...
  Round(8/9) {
    Performing 153110 moves... [ nScans=251 nChains=10 movesPerScan=61 ]
    swapSummaries [ round=7 lowest=0.595 average=0.668 ]
    logNormalizationContantProgress [ round=7 value=-20.780 ]
    estimatedLambda [ round=7 value=2.992 ]
    actualTemperedRestarts [ round=7 count=11 rate=0.044 ]
    asymptoticRoundTripBound [ round=7 count=31.436 rate=0.125 ]
  } [ endingBlock=Round(8/9) blockTime=811.1ms blockNErrors=0 ]
  Round(9/9) {
    Performing 305000 moves... [ nScans=500 nChains=10 movesPerScan=61 ]
    swapSummaries [ round=8 lowest=0.574 average=0.658 ]
    logNormalizationContantProgress [ round=8 value=-20.692 ]
    estimatedLambda [ round=8 value=3.075 ]
    actualTemperedRestarts [ round=8 count=34 rate=0.068 ]
    asymptoticRoundTripBound [ round=8 count=61.347 rate=0.123 ]
  } [ endingBlock=Round(9/9) blockTime=1.473s blockNErrors=0 ]
} [ endingBlock=Inference blockTime=3.505s blockNErrors=0 ]
Postprocess {
  Post-processing allLogDensities
  Post-processing energy
  Post-processing failureProbabilities
  Post-processing logDensity
  Post-processing m
  Post-processing s
  MC diagnostics
} [ endingBlock=Postprocess blockTime=27.55s blockNErrors=0 ]
executionMilliseconds : 31649
outputFolder : /Users/bouchard/w/blangExample/results/all/2019-04-01-12-43-49-TokB5Hso.exec

Looking at the files produced

Look at results/latest. We will go over those in the office hours. A copy of what is expected is available here.

Understanding the command line options

Add --help to the options. You should see the following, which we will go over:

Note: data source data/failure_counts.csv does not contain column failureprobabilities --- treating as missing
 --checkIsDAG <boolean> (default value: true)

 --engine <PosteriorInferenceEngine: SCM|PT|Forward|Exact|None|fully qualified> (default value: SCM)

 --engine.adaptFraction <double> (default value: 0.5)

 --engine.initialization <InitType: COPIES|FORWARD|SCM> (default value: SCM)

 --engine.ladder <TemperatureLadder: Geometric|EquallySpaced|Polynomial|UserSpecified|fully qualified> (default value: EquallySpaced)

 --engine.nChains <Integer> (optional)
   description: If unspecified, use the number of threads.

 --engine.nPassesPerScan <double> (default value: 3)

 --engine.nScans <int> (default value: 1_000)

 --engine.nThreads <Cores: Single|Max|Dynamic|Fixed|fully qualified> (default value: Dynamic)

 --engine.nThreads.fraction <double> (default value: 0.5)

 --engine.nThreads.ignoreUtilizedCores <boolean> (default value: true)

 --engine.nThreads.verbose <boolean> (default value: false)

 --engine.random <Random> (default value: 1)

 --engine.reversible <boolean> (default value: false)

 --engine.scmInit.maxAnnealingParameter <double> (default value: 1.0)
   description: Use higher values for likelihood maximization

 --engine.scmInit.nFinalRejuvenations <int> (default value: 5)
   description: Number of rejuvenation passes to do after the change of measure.

 --engine.scmInit.nParticles <int> (default value: 1_000)

 --engine.scmInit.nThreads <Cores: Single|Max|Dynamic|Fixed|fully qualified> (default value: Dynamic)

 --engine.scmInit.nThreads.fraction <double> (default value: 0.5)

 --engine.scmInit.nThreads.ignoreUtilizedCores <boolean> (default value: true)

 --engine.scmInit.nThreads.verbose <boolean> (default value: false)

 --engine.scmInit.random <Random> (default value: 1)
   description: Random seed used for proposals and resampling.

 --engine.scmInit.resamplingESSThreshold <double> (default value: 0.5)
   description: If the (relative) Effective Sample Size (ESS) falls below, perform a resampling round.

 --engine.scmInit.resamplingScheme <ResamplingScheme: STRATIFIED|MULTINOMIAL> (default value: STRATIFIED)

 --engine.scmInit.temperatureSchedule <TemperatureSchedule: AdaptiveTemperatureSchedule|FixedTemperatureSchedule|fully qualified> (default value: AdaptiveTemperatureSchedule)
   description: Algorithm selecting annealing parameter increments.

 --engine.scmInit.temperatureSchedule.nudgeFromZeroIfOutOfSupport <double> (default value: 1e-10)
   description: If all particles are out of support at first iteration, nudge the temperature a bit so that support constraints kick in.

 --engine.scmInit.temperatureSchedule.threshold <double> (default value: 0.9999)
   description: Annealing parameter is selected to get the (conditional) ESS decrease specified by this parameter.

 --engine.scmInit.temperatureSchedule.useConditional <boolean> (default value: true)
   description: See Zhou, Johansen and Aston (2013).

 --engine.targetAccept <Double> (optional)

 --engine.usePriorSamples <boolean> (default value: true)

 --excludeFromOutput <List: Space separated items or "file <path>" to load from newline separated file> (optional)

 --experimentConfigs.configFile <File> (optional)
   description: If set, use those arguments in provided file that do not appear in the provided arguments.

 --experimentConfigs.managedExecutionFolder <boolean> (default value: true)
   description: Automatically organize results into subdirectories of 'results/all'?

 --experimentConfigs.maxIndentationToPrint <int> (default value: inf)
   description: Use -1 to silence all HLogs output

 --experimentConfigs.recordExecutionInfo <boolean> (default value: true)
   description: Record information such as timing, main class, code version, etc for this run?

 --experimentConfigs.recordGitInfo <boolean> (default value: true)

 --experimentConfigs.saveStandardStreams <boolean> (default value: true)
   description: Save combined standard out and err into a file?

 --experimentConfigs.tabularWriter <TabularWriterFactory: CSV|Spark|fully qualified> (default value: CSV)

 --initRandom <Random> (default value: 1)

 --model <ModelBuilder: fully qualified>

 --model.data <GlobalDataSource: Path to the DataSource.>

 --model.data.reader <DataSourceReader: CSV|fully qualified> (default value: CSV)

 --model.data.reader.commentCharacter <Character> (optional)

 --model.data.reader.ignoreLeadingWhiteSpace <boolean> (default value: true)

 --model.data.reader.separator <char> (default value: ,)

 --model.data.reader.strictQuotes <boolean> (default value: false)

 --model.failureProbabilities.dataSource <DataSource: Path to the DataSource.>

 --model.failureProbabilities.dataSource.reader <DataSourceReader: CSV|fully qualified> (default value: CSV)

 --model.failureProbabilities.dataSource.reader.commentCharacter <Character> (optional)

 --model.failureProbabilities.dataSource.reader.ignoreLeadingWhiteSpace <boolean> (default value: true)

 --model.failureProbabilities.dataSource.reader.separator <char> (default value: ,)

 --model.failureProbabilities.dataSource.reader.strictQuotes <boolean> (default value: false)

 --model.failureProbabilities.name <ColumnName> (optional)
   description: Name of variable in the plate

 --model.m <RealVar: A number or NA> (optional)

 --model.numberOfFailures.dataSource <DataSource: Path to the DataSource.>

 --model.numberOfFailures.dataSource.reader <DataSourceReader: CSV|fully qualified> (default value: CSV)

 --model.numberOfFailures.dataSource.reader.commentCharacter <Character> (optional)

 --model.numberOfFailures.dataSource.reader.ignoreLeadingWhiteSpace <boolean> (default value: true)

 --model.numberOfFailures.dataSource.reader.separator <char> (default value: ,)

 --model.numberOfFailures.dataSource.reader.strictQuotes <boolean> (default value: false)

 --model.numberOfFailures.name <ColumnName> (optional)
   description: Name of variable in the plate

 --model.numberOfLaunches.dataSource <DataSource: Path to the DataSource.>

 --model.numberOfLaunches.dataSource.reader <DataSourceReader: CSV|fully qualified> (default value: CSV)

 --model.numberOfLaunches.dataSource.reader.commentCharacter <Character> (optional)

 --model.numberOfLaunches.dataSource.reader.ignoreLeadingWhiteSpace <boolean> (default value: true)

 --model.numberOfLaunches.dataSource.reader.separator <char> (default value: ,)

 --model.numberOfLaunches.dataSource.reader.strictQuotes <boolean> (default value: false)

 --model.numberOfLaunches.name <ColumnName> (optional)
   description: Name of variable in the plate

 --model.rocketTypes.dataSource <DataSource: Path to the DataSource.>

 --model.rocketTypes.dataSource.reader <DataSourceReader: CSV|fully qualified> (default value: CSV)

 --model.rocketTypes.dataSource.reader.commentCharacter <Character> (optional)

 --model.rocketTypes.dataSource.reader.ignoreLeadingWhiteSpace <boolean> (default value: true)

 --model.rocketTypes.dataSource.reader.separator <char> (default value: ,)

 --model.rocketTypes.dataSource.reader.strictQuotes <boolean> (default value: false)

 --model.rocketTypes.maxSize <Integer> (optional)

 --model.rocketTypes.name <ColumnName> (optional)

 --model.s <RealVar: A number or NA> (optional)

 --model.useHierarchical <Boolean> (optional)

 --postProcessor <PostProcessor: DefaultPostProcessor|NoPostProcessor|fully qualified> (default value: NoPostProcessor)

 --postProcessor.blangExecutionDirectory <File> (optional)
   description: When called from Blang, this will be the latest run, otherwise point to the .exec folder created by Blang

 --postProcessor.burnInFraction <double> (default value: 0.5)

 --postProcessor.essEstimator <EssEstimator: BATCH|ACT|AR> (default value: BATCH)

 --postProcessor.experimentConfigs.configFile <File> (optional)
   description: If set, use those arguments in provided file that do not appear in the provided arguments.

 --postProcessor.experimentConfigs.managedExecutionFolder <boolean> (default value: true)
   description: Automatically organize results into subdirectories of 'results/all'?

 --postProcessor.experimentConfigs.maxIndentationToPrint <int> (default value: inf)
   description: Use -1 to silence all HLogs output

 --postProcessor.experimentConfigs.recordExecutionInfo <boolean> (default value: true)
   description: Record information such as timing, main class, code version, etc for this run?

 --postProcessor.experimentConfigs.recordGitInfo <boolean> (default value: true)

 --postProcessor.experimentConfigs.saveStandardStreams <boolean> (default value: true)
   description: Save combined standard out and err into a file?

 --postProcessor.experimentConfigs.tabularWriter <TabularWriterFactory: CSV|Spark|fully qualified> (default value: CSV)

 --postProcessor.facetHeight <double> (default value: 2.0)
   description: In inches

 --postProcessor.facetWidth <double> (default value: 4.0)
   description: In inches

 --postProcessor.imageFormat <String> (default value: png)

 --postProcessor.rCmd <String> (default value: Rscript)

 --printAccessibilityGraph <boolean> (default value: false)

 --samplers.additional <SamplerSet: Fully qualified instances of blang.mcmc.Sampler>
   description: Samplers to be added.

 --samplers.excluded <SamplerSet: Fully qualified instances of blang.mcmc.Sampler>
   description: Samplers to be excluded (only useful if useAnnotation = true).

 --samplers.useAnnotation <boolean> (default value: true)
   description: If the arguments of the annotations @Samplers should be used to determine a starting set of sampler types.

 --stripped <boolean> (default value: false)
   description: Stripped means that the construction of forward simulators and annealers is skipped

 --treatNaNAsNegativeInfinity <boolean> (default value: false)

 --version <String> (optional)
   description: Version of the blang SDK to use (see https://github.com/UBC-Stat-ML/blangSDK/releases), of the form of a git tag x.y.z where x >= 2. If omitted, use the local SDK's 'master' version.

Model selection

Now I run again with --model.useHierarchical false and get \(\log(m_2) \approx -37.417\). Therefore the Bayes factor is given by:

\[ \frac{m_1}{m_2} = \exp( \log m_1 - \log m_2) = \exp((-20.692) - (-37.417)) = 18347429 \]

So the evidence for the hierarchical model vs. flat model is “decisive” (see this page for terminology, but use with a pinch of salt!)

Model selection, again

Note: the version with --model.useHierarchical false is a bit strange in the sense that the random variables \(m\) and \(s\) are still sampled over (but not used in the hierachical).

model ArianeFlat {
    
  param GlobalDataSource data
  param Plate<String> rocketTypes
  param Plated<IntVar> numberOfLaunches
  random Plated<RealVar> failureProbabilities
  random Plated<IntVar> numberOfFailures
  
  laws {
    for (Index<String> rocketType : rocketTypes.indices.filter[key.startsWith("Ariane")]) {
      failureProbabilities.get(rocketType)  ~ Beta(1.0, 1.0)
      numberOfFailures.get(rocketType)
        | RealVar failureProbability = failureProbabilities.get(rocketType),
          IntVar numberOfLaunch = numberOfLaunches.get(rocketType)
        ~ Binomial(numberOfLaunch, failureProbability)
    }
  }
}

Running this we get an estimate of \(m_2 \approx -37.589\) which is very close to our old estimate, \(-37.417\).

In fact, you can show that as the number of PT iterations and chains go to infinity, these numbers will converge to the same value.

Conclusion: this confirms we do not need to worry about parameters counting when using Bayes factors.