# Overview

• Point estimation: when you have to make a single best guess
• Set estimation: when you want to convey how much confidence you have about this best guess

# First example

• You repeatedly put your finger on a Earth globe uniformly at random
• Each time, you record if you “landed” on water (W) or land (L)
• The goal is to estimate the proportion $$p$$ of Earth covered by water

# Bayes estimator

Recall:

$\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}$

encodes a 3-step approach applicable to almost any statistical problems:

1. $$\color{red}{\text{Construct a probability model}}$$
2. $$\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}$$
3. $$\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an action}}$$

For the globe water/land example: steps 1 and 2 is the same as the Delta Rocket example

• $$Z$$ is $$p$$
• $$X$$ is the list of W / L encoded as 1 for W

# Model

• Joint distribution $$\gamma(p, x)$$ is specified via chain rule as:
• Uniform distribution on a random variable:
• $$p \sim {\text{Unif}}(0, 1)$$
• Putting your finger on the globe corresponds to an independent and identically distributed Bernoulli draw with parameter $$p$$,
• $$x_i | p \sim {\text{Bern}}(p)$$

# Point estimate

• The goal is to estimate the proportion $$p$$ of Earth covered by water
• Often, you need to provide one numerical best guess
• How to do this optimally?

# Point estimate from Bayes estimator

• Select a loss function
• Solve the optimization problem specified by the Bayes estimator

$\delta^*(X) = \color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}$

Step 3: $$\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an action}}$$

Example: square loss $${\mathcal{A}}= {\mathbf{R}}$$, $$L(a, p) = (a - p)^2$$

\begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[(Z - a)^2 | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[Z^2 | X] - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \end{aligned}

### Poll: $$\delta^*(x)$$ can be simplified to…

1. $$\int z p(z|x) {\text{d}}z$$
2. $${\textrm{argmax}}\{ p(z|x) : z \in {\mathbf{R}}\}$$
3. $$\int x p(z|x) {\text{d}}x$$
4. $${\textrm{argmax}}\{ p(z|x) : x \in {\mathbf{R}}\}$$
5. None of the above

# Point estimate from Bayes estimator

\begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[(Z - a)^2 | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[Z^2 | X] - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ - 2a{\mathbf{E}}[Z | X]] + a^2 : a \in {\mathcal{A}}\} \end{aligned}

Idea: think of $${\mathbf{E}}[Z|X]$$ as a constant that you get from the posterior. To minimize the bottom expression, take derivative with respect to $$a$$, equate to zero:

\begin{aligned} -2 {\mathbf{E}}[Z|X] + 2a = 0 \end{aligned} Hence: here the Bayes estimator is the posterior mean, $$\delta^*(X) = {\mathbf{E}}[Z|X] = \int z p(z|X) {\text{d}}z$$.

# Set estimate

• The weakness of point estimates is that they do not capture the uncertainty around the value
• Idea: instead of returning a single point, return a set of points
• usually an interval,
• but this can be generalized
• Bayesian terminology: credible interval ($$\neq$$ frequentist confidence intervals)
• Goals:
• We would like the credible interval to contain a fixed fraction of the posterior mass (e.g. 95%)
• At the same time, we would like this credible interval to be as short as possible given that posterior mass constraint
• Bayes estimator formalization:
• $${\mathcal{A}}= \{[c, d] : c < d\}$$,
• consider the loss function given by $L([c, d], z) = {{\bf 1}}\{z \notin [c, d]\} + k (d - c)$ for some tuning parameter $$k$$ to be determined later.

We get:

\begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z \notin [c, d] | X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z < c|X] + {\mathbb{P}}[Z > d |X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}[Z \le c|X] - {\mathbb{P}}[Z \le d |X] + k(d - c) : [c,d] \in {\mathcal{A}}\} \end{aligned}

Assuming the posterior has a continuous density $$f$$ to change $$<$$ into $$\le$$. Again we take the derivative with respect to $$c$$ and set to zero; then will do the same thing for $$d$$. Notice that $${\mathbb{P}}[Z \le c|X]$$ is the posterior CDF, so taking the derivative with respect to $$c$$ yields a density:

$f_{Z|X}(c) - k = 0,$

so we see the optimum will be the smallest interval $$[c, d]$$ such that $$f(c) = f(d) = k$$.

Finally, set $$k$$ to capture say 95% of the mass.