# Some recent questions on Piazza

• Note: there are many more interesting posts which I loved following, just picking a few because of time constraints
• Keep on the good work!

### “Why do we care about Probability mass function for slope first rather than Density plot for slope” — Anon

• a density plot is an estimator-visualization method not optimized for the case of distributions that are mixtures of continuous and dirac delta.
• For example, one thing it does not convey well is how much mass is on very small values versus exactly zero.
• Showing the two plots (PMF of indicator on zero + density) is a workaround.
• An interesting question is how to summarize posterior distributions over spike and slab variables.
• Potential project?

### “I’d like to ask how others interpreted this question: How would you proceed to set the prior distributions in this application example?” — Initiated by Anon, contributed by 4 Anons

Also related to: “Overly diffuse (non-informative) priors biasing the Bayes factor?” — Sierra

• Recall, context is the regression exercise: Regression.html
• I was surprised that the model did not put more mass on “using non-zero slope” (based on this visualization, it seemed there is solid trend)
• modifying the prior nudged it to put more mass on “using non-zero slope”
• orthogonal question: anybody tried a change point version? Did that model support a change?
• One answer that we will cover later: reference priors (so let’s defer this part of the discussion until later)
• “I tried a number of different parameters for priors, and pick the ones that give reasonable predictions before feeding in data.”
• Great approach
• How to implement?

Idea: performing forward simulation from models based on different priors.

• See the optional text book “Statistical rethinking” for some examples of this approach
• Key point here: “the ones that give reasonable predictions before feeding in data”
• If you use the data twice (once to fix the prior and second time to compute a posterior), a key calibration theoretical guarantees is voided!
• By Thursday, you should be able to see why this is the case. If not, please bring this up in Piazza next week!
• “I chose values (mean and standard deviation) based on probabilities that I think makes sense”
• Another great answer, related to the previous one. Often one of the first ones given by textbook.
• But in practice I often find this difficult.
• Trick from last slide helps, but for complex model it seems this should be automated…
• Constraints are more naturally stated on the observations
• Project idea
• Suppose you have a contraint (binary statement that depends on $$z$$ and $$x$$) and some confidence, say $$95\%$$ that this constraint holds. Can you propose an automatic way to incorporate this constraint into the prior (assuming you are using a PPL). What if you have several constraints, can you then “learn” the strength of the confidence instead of fixing it to $$95\%$$? What if you have several experts?
• Great reference posted by Jim Zidek: suppose you want to aggregate opinions from many experts: https://projecteuclid.org/download/pdf_1/euclid.ss/1177013825
• Please use that thread ([@52](https://piazza.com/class/kjuc7ie4sl71rs?cid=52)) to continue to post any pointers you find interesting about prior construction! I can add more too if people are interested.

### “I’m still a bit lost on what the following equation is representing” — Anon

$\Pi_n(\cdot) = \Pi_n(\cdot | X_1, X_2, \dots, X_n)$

Fair question!!! Let us walk over what the notation means for a concrete problem we know well: beta binomial…

Context:

• Bayesian notion of consistency under $$\theta$$
• setup:
• assume the “exchangeable setup”
• let $$\Pi_n(\cdot) = \Pi_n(\cdot | X_1, X_2, \dots, X_n)$$ denote the (random) posterior (why random: as for the frequentist case, composition of the observations random variable with the posterior update map)
• ask that $$\Pi_n(A) \to \delta_\theta(A)$$ for any $$A$$, under a suitable notion of convergence of random variables…
• Bayesians ask that the above holds $${\mathbb{P}}$$ almost sure, i.e. for a set of “good” $$\theta$$’s, denoted G, such that this good set has probability one under the prior $$\pi_0(G) = 1$$.

Theorem (Doob’s consistency): if an exchangeable model is identifiable, Bayesian consistency holds for this model.

### “Example of Integer(Poisson, Negative Binomial, etc.) support prior distribution”

• Great historical example: German tank problem
• How many animals in a region based on capture-recapture data
• Epidemiology: how many people infected from noisy reporting?
• Also, coming up soon: prior on the number of mixture components.

# Question from me

• Peer reviewing system: can you access the reviewer comments? Any comments or proposed improvements?
• Other comments / proposed improvements?