Alexandre Bouchard-Côté

- We have invoked the “exchangeable/iid” setup several times already
- De Finetti theorem will give us more insight in this class of models
- Motivation: often, may not want to rely on the order of the rows in the spreadsheet given to you. So we may want to shuffle it to be safe.
- In such circustance, de Finetti motivates the existence of a well-specified model with the graphical model on the right
- I.e. that there exists a prior such that the data is iid given that prior
- This motivation gives you an idea of the theorem, but is not actually 100% exactly what de Finetti says, let’s formalize the theorem!

- Recall: notion of equality in distribution \(X_1 {\stackrel{\scriptscriptstyle d}{=}}X_2\)
- …this means the
*distribution*of \(X_1\) is equal to the distribution of \(X_2\) - …concretely: draw the marginal PMF or density of each random variable, check if they match

- …this means the

- This cannot happen \(X_1 {\stackrel{\scriptscriptstyle d}{=}}X_2\) if and only if \(X_1 = X_2\)
- You flip a fair coin and say what is on the top \(X_1\). Your friend flip another coin \(X_2\) with bias 1/3
- You flip a fair coin and say what is on the top \(X_1\). Your friend say what is on the bottom \(X_2\)

- You flip a fair coin and say what is on the top \(X_1\). Your friend say what is on the top, \(X_2\)

- Two random variables are exchangeable if \((X_1, X_2) {\stackrel{\scriptscriptstyle d}{=}}(X_2, X_1)\)
- Example: indicator variable (i.e. Bernoulli random variables).

- It should be invariant to 90 degree rotation around the origin (clockwise)
- It should be invariant to 90 degree rotation around the origin (counter-clockwise)
- It should be symmetric with respect to the line \(y = -x\)
- It should be symmetric with respect to the line \(y = x\)

- Using our criterion (symmetry along the line \(y = x\)), we obtain that the indicator on a square, a circle, and checkers board centered at zero all lead to exchangeable random variables.

- Only the square is iid
- Only the circle is iid
- Only the checker board is iid
- All of them
- None of them

- Note: exchangeability implies identical distributions
- Note: this notion is closely related to reversibility
- Generalization: \(n\) variables are exchangeable if for all permutations \(\sigma : \{1, 2, \dots, n\} \to \{1, 2, \dots, n\} \in S_n\), \((X_1, X_2, \dots, X_n) {\stackrel{\scriptscriptstyle d}{=}}(X_{\sigma(1)}, X_{\sigma(2)}, \dots, X_{\sigma(n)})\)
- Infinite exchangeable: an infinite sequence of random variable \((X_1, X_2, \dots)\) is exchangeable if all finite subsets are exchangeable.

**Theorem:** if \((X_1, X_2, \dots)\) are exchangeable Bernoulli random variables, then there exists a \(\theta\), a prior on \(\theta\), and a likelihood such that the observations are iid given \(\theta\).

- Note: This is not guaranteed to hold for finite \(n\) (e.g., works for the checker board, not possible for the circle)
- Leads to a philosophy: “Stochastic process based design of Bayesian models”
- Construct models containing an infinite number of observations a priori
- Condition on only the subset you actually observed
- Ideally, construct the model such that you can analytically marginalize the infinite suffix of unobserved data
- Note: it is trivial to marginalization a node \(X_i\) in a graphical model such that there is no directed path from \(X_i\) to any observation

- More background on exchangeability and de Finetti theorems:
- Notes from Tim Austin: https://www.math.ucla.edu/~tim/ExchnotesforIISc.pdf
- Applications on Bayesian modelling by Orbanz and Roy: https://arxiv.org/pdf/1312.7857.pdf

- State of the art work developed by UBC faculty:
- Application to neural networks: https://www.jmlr.org/papers/v21/19-322.html
- Generalization of exchangeability: https://arxiv.org/abs/1906.09507