Alexandre Bouchard-Côté
Recall The Bayes estimator,
\[\color{blue}{{\textrm{argmin}}} \{ \color{red}{{\mathbf{E}}}[\color{blue}{L}(a, Z) \color{green}{| X}] : a \in {\mathcal{A}}\}\]
encodes a 3-step approach applicable to almost any statistical problems:
Frequentist risk: view \(\theta = z\) as parameters for a likelihood / indexing probabilities over observables \(\{{\mathbb{P}}_z\}\), with a corresponding collections of expectation operators \(\{{\mathbf{E}}_z\}\), \[ \begin{aligned} R(z, \delta) &= {\mathbf{E}}_z[L(\delta(X), z)] \\ &= \int L(\delta(x), z)\ \text{likelihood}(x | z) {\text{d}}x \end{aligned} \]
Bayesian notion: integrated risk \[ \begin{aligned} r(\delta) &= {\mathbf{E}}[L(\delta(X), Z)] \\ &= \int \int L(\delta(x), z)\ \text{prior}(z)\ \text{likelihood}(x | z)\ {\text{d}}x {\text{d}}z \end{aligned} \]
Key difference:
So far: abstract definition of Bayes estimators as minimizers of the integrated risk \[ \begin{aligned} \delta^* &= {\textrm{argmin}}_{\delta : {\mathscr{X}}\to {\mathcal{A}}} \{ r(\delta) \} \\ r(\delta) &= {\mathbf{E}}[L(\delta(X), Z)] \end{aligned} \]
More explicit expression: The estimator \(\delta^*\), defined by the equation below, minimizes the integrated risk
\[ \delta^*(X) = {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \]
This estimator \(\delta^*\) is called a Bayes estimator.
This means that given a model and a goal, the Bayesian framework provides in principle a recipe for constructing an estimator.
However, the computation required to implement this recipe may be considerable. This explains why computational statistics plays a large role in Bayesian statistics and in this course.
\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in {\mathcal{A}}\} \\ \end{aligned} \]
Idea that could be part of a project: stochastic gradient meets Bayes estimators
If \({\mathcal{A}}\) is tricky to explore (combinatorial, constrained such as the motivating tracking problem, etc), and \({\mathcal{A}}= \{z \in {\mathscr{Z}}\}\) can further approximate both the objective and constraints as follows:
\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &\approx {\textrm{argmin}}\{ \frac{1}{M} \sum_{i=1}^M L(a, Z_i) : a \in \{Z_1, Z_2, \dots, Z_M\} \} \\ \end{aligned} \]
Recall: admissibility, a frequentist notion of optimality (or rather, non-sub-optimality).
Proposition: if a Bayes estimator is unique, it is admissible.
To show uniqueness, may try to use convexity of loss function for example.
No lecture on February 16 and 18 (reading week)