Alexandre Bouchard-Côté

- Note: there are many more interesting posts which I loved following, just picking a few because of time constraints
- Keep on the good work!

- a density plot is an estimator-visualization method not optimized for the case of distributions that are mixtures of continuous and dirac delta.
- For example, one thing it does not convey well is how much mass is on very small values versus exactly zero.

- Showing the two plots (PMF of indicator on zero + density) is a workaround.
- An interesting question is how to summarize posterior distributions over spike and slab variables.
- Potential project?

Also related to: “Overly diffuse (non-informative) priors biasing the Bayes factor?” — Sierra

- Recall, context is the regression exercise: Regression.html
- I was surprised that the model did not put more mass on “using non-zero slope” (based on this visualization, it seemed there is solid trend)
- modifying the prior nudged it to put more mass on “using non-zero slope”
- orthogonal question: anybody tried a change point version? Did that model support a change?

- One answer that we will cover later: reference priors (so let’s defer this part of the discussion until later)
- “I tried a number of different parameters for priors, and pick the ones that give reasonable predictions before feeding in data.”
- Great approach
- How to implement?

**Idea**: performing forward simulation from models based on different priors.

- See the optional text book “Statistical rethinking” for some examples of this approach
- Here is one where the author was designing priors to do regression on people’s heights based on their weight: https://speakerdeck.com/rmcelreath/l03-statistical-rethinking-winter-2019?slide=40

**Key point here**: “the ones that give reasonable predictions**before**feeding in data”- If you use the data twice (once to fix the prior and second time to compute a posterior), a key calibration theoretical guarantees is voided!
- By Thursday, you should be able to see why this is the case. If not, please bring this up in Piazza next week!

- “I chose values (mean and standard deviation) based on probabilities that I think makes sense”
- Another great answer, related to the previous one. Often one of the first ones given by textbook.
- But in practice I often find this difficult.
- Trick from last slide helps, but for complex model it seems this should be automated…

- Constraints are more naturally stated on the observations
**Project idea**- Suppose you have a contraint (binary statement that depends on \(z\) and \(x\)) and some confidence, say \(95\%\) that this constraint holds. Can you propose an automatic way to incorporate this constraint into the prior (assuming you are using a PPL). What if you have several constraints, can you then “learn” the strength of the confidence instead of fixing it to \(95\%\)? What if you have several experts?

- Great reference posted by Jim Zidek: suppose you want to aggregate opinions from many experts: https://projecteuclid.org/download/pdf_1/euclid.ss/1177013825
- See also fascinating case studies in https://www.sciencedirect.com/science/article/pii/S0951832007000944?via%3Dihub

- Please use that thread ([@52](https://piazza.com/class/kjuc7ie4sl71rs?cid=52)) to continue to post any pointers you find interesting about prior construction! I can add more too if people are interested.
- See also reference https://www.nature.com/articles/s43586-020-00001-2 posted by Sierra, section on priors has many pointers

\[\Pi_n(\cdot) = \Pi_n(\cdot | X_1, X_2, \dots, X_n)\]

Fair question!!! Let us walk over what the notation means for a concrete problem we know well: beta binomial…

Context:

- Bayesian notion of consistency under \(\theta\)
- setup:
- assume the “exchangeable setup”
- let \(\Pi_n(\cdot) = \Pi_n(\cdot | X_1, X_2, \dots, X_n)\) denote the (random) posterior (why random: as for the frequentist case, composition of the observations random variable with the posterior update map)
- ask that \(\Pi_n(A) \to \delta_\theta(A)\) for any \(A\), under a suitable notion of convergence of random variables…
- Bayesians ask that the above holds \({\mathbb{P}}\) almost sure, i.e.
*for a set of “good” \(\theta\)’s, denoted G, such that this good set has probability one under the prior \(\pi_0(G) = 1\).*

- setup:

**Theorem (Doob’s consistency):** if an exchangeable model is identifiable, Bayesian consistency holds for this model.

- Great historical example: German tank problem
- How many animals in a region based on capture-recapture data
- Epidemiology: how many people infected from noisy reporting?
- Also, coming up soon: prior on the number of mixture components.

- Peer reviewing system: can you access the reviewer comments? Any comments or proposed improvements?
- Other comments / proposed improvements?