# Bayes estimators

Recall The Bayes estimator,

$\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}$

encodes a 3-step approach applicable to almost any statistical problems:

1. $$\color{red}{\text{Construct a probability model}}$$
2. $$\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}$$
3. $$\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an "action"}}$$

# Estimators

• We want to devise a decision-making strategy, which we formalize as an estimator:
• a function that take as input only the observations, $$\delta(x)$$, and output a proposed action, $$\delta(x) \in {\mathcal{A}}$$.
• i.e., $$\delta : {\mathscr{X}}\to {\mathcal{A}}$$
• We want this estimator to be as “good” as possible.
• Under a certain criterion of goodness, we will see that the Bayesian framework provides a principled and systematic way of specifying a “best” estimator.

# Evaluation of estimators

• Frequentist risk: view $$\theta = z$$ as parameters for a likelihood / indexing probabilities over observables $$\{{\mathbb{P}}_z\}$$, with a corresponding collections of expectation operators $$\{{\mathbf{E}}_z\}$$, \begin{aligned} R(z, \delta) &= {\mathbf{E}}_z[L(\delta(X), z)] \\ &= \int L(\delta(x), z)\ \text{likelihood}(x | z) {\text{d}}x \end{aligned}

• Bayesian notion: integrated risk \begin{aligned} r(\delta) &= {\mathbf{E}}[L(\delta(X), Z)] \\ &= \int \int L(\delta(x), z)\ \text{prior}(z)\ \text{likelihood}(x | z)\ {\text{d}}x {\text{d}}z \end{aligned}

Key difference:

• Frequentist risk: a partial order on estimators
• Only canonical notion of optimality is then non-dominance, called (statistical) efficiency
• Bayes risk: a complete order on estimators (under weak conditions)
• Can actually get an expression for that optimal estimator
• As a bonus, also satisfies the frequentist notion of non-sub-optimality: Bayes estimators are efficient under weak conditions (more on this later)

# The Bayes estimator

So far: abstract definition of Bayes estimators as minimizers of the integrated risk \begin{aligned} \delta^* &= {\textrm{argmin}}_{\delta : {\mathscr{X}}\to {\mathcal{A}}} \{ r(\delta) \} \\ r(\delta) &= {\mathbf{E}}[L(\delta(X), Z)] \end{aligned}

More explicit expression: The estimator $$\delta^*$$, defined by the equation below, minimizes the integrated risk

$\delta^*(X) = {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\}$

This estimator $$\delta^*$$ is called a Bayes estimator.

This means that given a model and a goal, the Bayesian framework provides in principle a recipe for constructing an estimator.

However, the computation required to implement this recipe may be considerable. This explains why computational statistics plays a large role in Bayesian statistics and in this course.

# Black box optimization

• Objective function from $$M$$ Monte Carlo samples:

\begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in {\mathcal{A}}\} \\ \end{aligned}

• Idea that could be part of a project: stochastic gradient meets Bayes estimators

• If $${\mathcal{A}}$$ is tricky to explore (combinatorial, constrained such as the motivating tracking problem, etc), and $${\mathcal{A}}= \{z \in {\mathscr{Z}}\}$$ can further approximate both the objective and constraints as follows:

\begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in \{Z_1, Z_2, \dots, Z_M\} \} \\ \end{aligned}

• Idea that could be part of a project: Bayes estimator for feature matrices

# Bayes estimators from a frequentist perspective

Recall: admissibility, a frequentist notion of optimality (or rather, non-sub-optimality).

• An estimator $$\delta$$ is admissible if there are no dominating estimator $$\delta'$$
• Domination here under the frequentist risk $$R(z, \delta) = {\mathbf{E}}_z[L(\delta(X), z)]$$,
• i.e. $$\delta$$ is admissible if there is no $$\delta'$$ such that for all $$z$$, $$R(z, \delta') < R(z, \delta)$$

Proposition: if a Bayes estimator is unique, it is admissible.

To show uniqueness, may try to use convexity of loss function for example.

# Reminder

No lecture on February 16 and 18 (reading week)