Point estimates, confidence estimates, and the Bayes estimator

Alexandre Bouchard-Côté

Overview

First example

Based on: example in textbook Statistical Rethinking

Bayes estimator

Recall:

\[\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\]

encodes a 3-step approach applicable to almost any statistical problems:

  1. \(\color{red}{\text{Construct a probability model}}\)
  2. \(\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}\)
  3. \(\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an action}}\)

For the globe water/land example: steps 1 and 2 is the same as the Delta Rocket example

Model

Point estimate

Point estimate from Bayes estimator

\[\delta^*(X) = \color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\]

Step 3: \(\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an action}}\)

Example: square loss \({\mathcal{A}}= {\mathbf{R}}\), \(L(a, p) = (a - p)^2\)

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[(Z - a)^2 | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[Z^2 | X] - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \end{aligned} \]

Poll: \(\delta^*(x)\) can be simplified to…

  1. \(\int z p(z|x) {\text{d}}z\)
  2. \({\textrm{argmax}}\{ p(z|x) : z \in {\mathbf{R}}\}\)
  3. \(\int x p(z|x) {\text{d}}x\)
  4. \({\textrm{argmax}}\{ p(z|x) : x \in {\mathbf{R}}\}\)
  5. None of the above

Point estimate from Bayes estimator

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[(Z - a)^2 | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[Z^2 | X] - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \end{aligned} \]

Idea: think of \({\mathbf{E}}[Z|X]\) as a constant that you get from the posterior. To minimize the bottom expression, take derivative with respect to \(a\), equate to zero:

\[ \begin{aligned} -2 {\mathbf{E}}[Z|X] + 2a = 0 \end{aligned} \] Hence: here the Bayes estimator is the posterior mean, \(\delta^*(X) = {\mathbf{E}}[Z|X] = \int z p(z|X) {\text{d}}z\).

Set estimate

We get:

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z \notin [c, d] | X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z < c|X] + {\mathbb{P}}[Z > d |X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z \le c|X] - {\mathbb{P}}[Z \le d |X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \end{aligned} \]

Assuming the posterior has a continuous density \(f\) to change \(<\) into \(\le\). Again we take the derivative with respect to \(c\) and set to zero; then will do the same thing for \(d\). Notice that \({\mathbb{P}}[Z \le c|X]\) is the posterior CDF, so taking the derivative with respect to \(c\) yields a density:

\[ f_{Z|X}(c) - k = 0, \]

so we see the optimum will be the smallest interval \([c, d]\) such that \(f(c) = f(d) = k\).

Finally, set \(k\) to capture say 95% of the mass.