Bayes estimators: properties and optimization

Alexandre Bouchard-Côté

Bayes estimators

Recall The Bayes estimator,

\[\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\]

encodes a 3-step approach applicable to almost any statistical problems:

  1. \(\color{red}{\text{Construct a probability model}}\)
  2. \(\color{green}{\text{Compute or approximate the posterior distribution conditionally on the actual data at hand}}\)
  3. \(\color{blue}{\text{Solve an optimation problem to turn the posterior distribution into an "action"}}\)

Estimators

Evaluation of estimators

Key difference:

The Bayes estimator

So far: abstract definition of Bayes estimators as minimizers of the integrated risk \[ \begin{aligned} \delta^* &= {\textrm{argmin}}_{\delta : {\mathscr{X}}\to {\mathcal{A}}} \{ r(\delta) \} \\ r(\delta) &= {\mathbf{E}}[L(\delta(X), Z)] \end{aligned} \]

More explicit expression: The estimator \(\delta^*\), defined by the equation below, minimizes the integrated risk

\[ \delta^*(X) = {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \]

This estimator \(\delta^*\) is called a Bayes estimator.

This means that given a model and a goal, the Bayesian framework provides in principle a recipe for constructing an estimator.

However, the computation required to implement this recipe may be considerable. This explains why computational statistics plays a large role in Bayesian statistics and in this course.

Black box optimization

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in {\mathcal{A}}\} \\ \end{aligned} \]

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in \{Z_1, Z_2, \dots, Z_M\} \} \\ \end{aligned} \]

Bayes estimators from a frequentist perspective

Recall: admissibility, a frequentist notion of optimality (or rather, non-sub-optimality).

Proposition: if a Bayes estimator is unique, it is admissible.

To show uniqueness, may try to use convexity of loss function for example.

Reminder

No lecture on February 16 and 18 (reading week)