\[\begin{align} \renewcommand{\Pr}{\mathbb{P}} \newcommand{\Bern}{\text{Bern}} \newcommand{\Unif}{\text{Unif}} \newcommand{\Bet}{\text{Beta}} \newcommand{\Bin}{\text{Bin}} \newcommand{\Categorical}{\text{Categorical}} \newcommand{\Gamm}{\text{Gamma}} \newcommand\prior{\text{prior}} \newcommand\like{\text{likelihood}} \newcommand{\E}{\mathbf{E}} \newcommand{\var}{\text{Var}} \newcommand{\sd}{\text{SD}} \newcommand{\R}{\mathbf{R}} \newcommand{\1}{{\bf 1}} \newcommand{\argmin}{\textrm{argmin}} \newcommand{\argmax}{\textrm{argmax}} \newcommand{\Ascr}{\mathcal{A}} \newcommand{\ud}{\text{d}} \newcommand\deq{\stackrel{\scriptscriptstyle d}{=}} \newcommand{\Xscr}{\mathscr{X}} \newcommand{\Zscr}{\mathscr{Z}} \newcommand{\randindex}{\textrm{rand}} \newcommand{\rhot}{\rho} \newcommand{\rhop}{{\rho'}} \end{align}\]
In this tutorial, we will go over a simple example of how to apply a Bayes estimator in practice. We will use a dataset of rocket launches to illustrate the concepts.
You are consulting for a satellite operator company. The company is about to
about to send a $50M satellite on a Delta 7925H rocket.
Data: as of November 2020 has been launched 3
times, with 0
failed launches.
Should you recommend buying a $1M insurance policy?
Maximum likelihood would therefore give an estimate of 0% for the chance of failure, and hence that no insurance should be purchased. Obviously we will not use this crazy estimate today and instead use Bayesian methods.
Let us use the simplest Bayesian model for this task (in a later tutorial, we will
improve this model): we assume the four launches are independent,
biased coin flips, all with a shared
probability of failure (bias) given by an unknown
parameter failureProbability
.
In other words, we assume a binomial likelihood.
We also assume a beta prior on the failureProbability
, the parameter $p$ of the binomial
(the parameter $n$ is known and set to the number of observed launches, numberOfLaunches
).
Let us walk over the step involved into solving this problem using the Bayes estimator and the Blang Probabilistic Programming Language.
The laws { ... }
block specifies a joint distribution ("law") over a collection of random
variables, namely those marked with the keyword random
when they are declared above. Each
line starting in the form variable ~ ...
declares a distribution or conditional distribution.
We haven't introduced observation yet (we will do this when conditioning, step 2),
so for now we treat all variable as unobserved (marked using ?: latentReal
and ?: latentInt
).
Click on "See output" above to see the main plots created by running the code. You can find in
particular a visualization of the marginal distribution on the random variable failureProbability
by following the tab "Plots", then "posteriorPlot" located in the top right of the page linked by "See output".
Right now, we are not conditioning on any of the random variables so the "posterior" reduces to the
joint density.
Look at the plot "failureProbability.pdf". Convince yourself that the approximation should match the prior. Check if it does.
Look at the plot "nextLaunch.pdf". Use the law of total probability to find the true value of the marginal probability mass function over the nextLaunch variable. Check the quality of the approximation based on this.
We will now introduce the actual dataset (3 successes, 0 failures).
Note that we haven't changed the description of the joint distribution (in "laws").
Rather, we changed ?: latentInt
into ?: 0
. The effect of this is that running the
code will now approximate the posterior distribution of the variable that are unknown (latent), given
those that now have a known value. (launch1, launch2, launch3)
Look at the marginal distribution of nextLaunch. It is now a posterior marginal. Notice that it has been updated
Optional: Compute the exact posterior distribution using Bayes rule (hint: to deal with the denominator, use the fact that the numerator Bayes rule has the same functional for as a Beta distribution with respect to the unknown quantity, this trick is called "conjugacy"). Plot the posterior density and compare it with the approximation obtained from Blang.
In the following, we let $x = (x_1, x_2, x_3)$ denote the observed data, and $z = (p, x_4)$, the unobserved data.
To make a decision, we need to encode the following into a loss function (all numbers in \$M):
With insurance $(a=1)$: loss of 0 if success, loss of 50 if failure.
Without $(a=0)$: loss of 1 if success, loss of 1 if failure.
We can encode this as $L(a, z) = \1[a = 1] (\$1M) + \1[a = 0] \1[x_4 = 1] (\$50M)$
The Bayes estimator tells us we should use
\[\begin{align} \newcommand\testddd{OK} \argmin \{ \E[L(a, Z) | X] : a \in \mathcal{A} \} \end{align}\]
This can be simplified into:
\[\begin{align} \argmin \{ \E[L(a, Z) | X] : a \in \Ascr \} &= \argmin \{ \E[\1[a = 1] + 50 \1[a = 0] \1[X_4 = 1] | X] : a \in \Ascr \} \\ &= \argmin \{ \E[\1[a = 1] | X] + 50 \E[ \1[a = 0] \1[X_4 = 1] | X] : a \in \Ascr \} \\ &= \argmin \{ \1[a = 1] + 50 \1[a = 0] \Pr[X_4 = 1 | X] : a \in \Ascr \} \\ &= \1[p_\text{fail}(X) > 1/50]], \end{align}\]
Combine the posterior computed in the second step with the Bayes estimator to make a decision.
Optional: does the decision change if a different prior is used?