Bayes estimator: a first example

Alexandre Bouchard-Côté

Running example: insurance policy for a Delta 7925H

drawing

Poll: Back of the envelope recommendation

  1. No, since estimated probability of failure is equal to \(0 = 0/3\)
  2. No, since estimated probability of failure is between 0 and some constant
  3. No, since estimated probability of failure is between some constant and one
  4. Yes, since estimated probability of failure is smaller than some constant
  5. Yes, since estimated probability of failure is greater than some constant

Bayes estimator: first overview

The Bayes estimator,

\[\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\]

encodes a 3-step approach applicable to almost any statistical problems:

  1. \(\color{red}{\text{Construct a probability model}}\)
  2. \(\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}\)
  3. \(\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an "action"}}\)

Let us examplify this process to the Delta rocket insurance problem.

Step 1: Probability model

\[{\textrm{argmin}}\{ \color{red}{{\mathbf{E}}}[L(a, Z) | X] : a \in {\mathcal{A}}\}\]

A Bayesian model is a probability model equipped with a:

Concretely: a Bayesian model = a joint density \(\gamma(x, z)\)

Poll: What is the space of possible Bayesian models for the Delta rocket example?

  1. a single parameter, \(p \in [0, 1]\)
  2. a discrete space of cardinality \(2^4\)
  3. a list of \(2^4\) real numbers \(p_i \in [0, 1]\), which sum to one (note: fixed bug, first version wrote \(4\) instead of \(2^4\))
  4. all possible probability distributions
  5. none of the above

Choosing a Bayesian model

Two techniques for building Bayesian models

Technique one: introducing nuisance variables

Augment the joint distribution with more unknown quantities, often called “parameters”.

Example:

Two techniques for building Bayesian models

Technique two: impose “regularities”

Symmetries, (conditional) independences, identical distributions, factorization, etc.

Example:

Poll: Now that I have made these assumptions, do I have ONE probability model?

  1. No, \(p\) is still unknown
  2. No, \(x\) is still unknown
  3. Yes, by Bayes rule
  4. Yes, by the Law of total probability
  5. Yes, by chain rule

Prior and likelihood?

Step 2: Conditioning

\[{\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\] \(\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}\)

What you learned about conditioning in a undergraduate course

To compute \({\mathbf{E}}[f(Z) | X]|_{x}\):

What you need to know about conditioning

Recall: Bayesians almost never use Bayes rule!

Instead:

Conditioning using a probabilistic programming language

Our model is:

Increasingly, applied Bayesian statician use probabilistic programming languages (PPLs) to approximate conditional expectations

Some of you may be familiar with the BUGS family of languages (WinBUGS, OpenBUGS, JAGS, Stan, Blang)

Example:

launch1 | failureProbability ~ Bernoulli(failureProbability)
launch2 | failureProbability ~ Bernoulli(failureProbability)
launch3 | failureProbability ~ Bernoulli(failureProbability)

nextLaunch | failureProbability ~ Bernoulli(failureProbability)

failureProbability ~ ContinuousUniform(0, 1)

Poll: Which function \(f(z)\) should I use to cast \({\mathbb{P}}(X_4 = 1 | X)\) as \({\mathbf{E}}[f(Z)|X]\)

  1. \(f(z) = z\)
  2. \(f(z) = {{\bf 1}}[z = 1]\)
  3. \(f(z) = 1\)
  4. \(f(z) = {{\bf 1}}[x_4 = 1]\)
  5. None of the above

Conditioning using PPLs: theory

The PPLs we will use in this course are based on Monte Carlo methods.

The theoretical foundation of Monte Carlo methods: laws of large numbers (LLN) which is any theorem providing a result of the form

\[\frac{1}{K} \sum_{k=1}^K f(Z_k) \to \int f(z) \pi(z) {\text{d}}z\]

where:

Conditioning using PPLs: practical example

During the lecture, we went over steps 1 and 2 in this tutorial walking you throught the steps of approximating a posterior distribution using the Blang PPL (focussing on general principles which apply to most other mainstream PPLs, e.g. Stan, JAGS, pyMC). Complete the rest as a short exercise set.

Step 3: take the Bayes action

\[\color{blue}{{\textrm{argmin}}} \{ {\mathbf{E}}[\color{blue}{L}(a, Z) | X] : a \in {\mathcal{A}}\}\]

\(\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an "action"}}\)

The loss function in our example should encode the following information:

Exercise: Design a loss function \(L\) that encodes the above information

Use the notation:

Example of loss function

Losses ($M):

Launch With insurance \((a=1)\) Without \((a=0)\)
Success \((z=0)\) 1 0
Failure \((z=1)\) 1 50

Loss function: \(L(a, z) = {{\bf 1}}[a = 1] (\$1M) + {{\bf 1}}[a = 0] {{\bf 1}}[x_4 = 1] (\$50M)\)

Next exercise: simplify the optimization problem in the Bayes estimator

\[\color{blue}{{\textrm{argmin}}} \{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\}\]

Example of a Bayes estimator

\[ \begin{align*} {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} &= {\textrm{argmin}}\{ {\mathbf{E}}[{{\bf 1}}[a = 1] + 50 {{\bf 1}}[a = 0] {{\bf 1}}[X_4 = 1] | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[{{\bf 1}}[a = 1] | X] + 50 {\mathbf{E}}[ {{\bf 1}}[a = 0] {{\bf 1}}[X_4 = 1] | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {{\bf 1}}[a = 1] + 50 {{\bf 1}}[a = 0] {\mathbb{P}}[X_4 = 1 | X] : a \in {\mathcal{A}}\} \\ &= {{\bf 1}}[p_\text{fail}(X) > 1/50]], \end{align*} \]

where \(p_\text{fail}(X) = {\mathbb{P}}[X_4 = 1 | X]\) encodes our predicted probability that the next launch will be a failure.

From earlier: “Assume for now that we can ‘magically’ approximate \({\mathbf{E}}[f(Z)|X]\) for any function \(f(z)\), based on the output of a PPL.”

So indeed if we pick \(f(z) = {{\bf 1}}[x_4=1]\), then \({\mathbb{P}}[X_4 = 1|X]\) can be written as \({\mathbf{E}}[f(Z)|X]\), and hence approximated based on the output of a PPL.

Go over step 3 in the first PPL tutorial.