Alexandre Bouchard-Côté
3/11/2019
Today:
Updated schedule with some readings
Exercise 2:
Optional (Q2): implement a JAGS model for the change point data (last few slides today)
Theorem: if \((X_1, X_2, \dots)\) are exchangeable Bernoulli random variables, then there exists a \(\theta\), a prior on \(\theta\), and a likelihood such that the observations are iid given \(\theta\).
JAGS and related PPLs are very different than other languages you are used to:
Syntax almost the same as mathematical notation used in the lecture:
Z ~ dcat(hyperparameters)
p <- Z / 10where we will set hyperparameters to the vector of length 9, \((1/9, 1/9, \dots, 1/9)\)
Some quick references for convenience (see the tutorials for more context and information). The notation in JAGS is fairly similar to standard mathematical notation, but with some slight differences.
dbern(p)dbin(p, n)dcat(p)dpois(lambda)dnegbin(p, 1) support is \(0, 1, 2, \dots\)dnegbin(p, r) support is \(0, 1, 2, \dots\)X ~ dunif(a, b)X ~ dnorm(mu,1/sigma^2) uses the inverse of the variance, a parameter called the precisionX ~ dexp(lambda) rate parameterizationX ~ dgamma(alpha, lambda) shape-rate parameterizationabs(x)exp(x)ifelse(x,a,b) very useful to build complex models!log(x)sqrt(x)max(x, y) works on vectors and/or more than two argumentsmin(x, y)sum(x) sums the elements in the vector xTo get detailed information:
Context:
Data:
data <- read.csv("challenger_data-train.csv")
knitr::kable(data, floating.environment="sidewaystable")| Date | Temperature | Damage.Incident | 
|---|---|---|
| 04/12/1981 | 66 | 0 | 
| 11/12/1981 | 70 | 1 | 
| 3/22/82 | 69 | 0 | 
| 6/27/82 | 80 | NA | 
| 01/11/1982 | 68 | 0 | 
| 04/04/1983 | 67 | 0 | 
| 6/18/83 | 72 | 0 | 
| 8/30/83 | 73 | 0 | 
| 11/28/83 | 70 | 0 | 
| 02/03/1984 | 57 | 1 | 
| 04/06/1984 | 63 | 1 | 
| 8/30/84 | 70 | 1 | 
| 10/05/1984 | 78 | 0 | 
| 11/08/1984 | 67 | 0 | 
| 1/24/85 | 53 | 1 | 
| 04/12/1985 | 67 | 0 | 
| 4/29/85 | 75 | 0 | 
| 6/17/85 | 70 | 0 | 
| 7/29/85 | 81 | 0 | 
| 8/27/85 | 76 | 0 | 
| 10/03/1985 | 79 | 0 | 
| 10/30/85 | 75 | 1 | 
| 11/26/85 | 76 | 0 | 
| 01/12/1986 | 58 | 1 | 
Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
Model: logistic regression. Input: temperature. Output: failure indicator (i.e. binary) variable.
Questions:
Approaches:
model {
  slope ~ dnorm(0, 0.001)
  intercept ~ dnorm(0, 0.001)
  for (i in 1:length(temperature)) {
    # incident[i] ~ dbern(1.0 / (1.0 + exp(- intercept - slope * temperature[i])))
    p[i] <- ilogit(intercept + slope * temperature[i])
    incident[i] ~ dbern(p[i])
  }
  # predictedPr <- 1.0 / (1.0 + exp(- intercept - slope * 31))
  predictedPr <- ilogit(intercept + slope * 31)
  predictedBinary ~ dbern(predictedPr)
}require("rjags")## Loading required package: rjags## Loading required package: coda## Linked to JAGS 4.2.0## Loaded modules: basemod,bugsrequire("coda")
require("ggplot2")
data <- read.csv( file = "challenger_data-train.csv", header = TRUE)
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is 
# deterministic given that initial seed.
my.seed <- 1 
set.seed(my.seed) 
inits <- list(.RNG.name = "base::Mersenne-Twister",
              .RNG.seed = my.seed)
model <- jags.model(
  'model.bugs', 
  data = list( # Pass on the data:
    'incident' = data$Damage.Incident, 
    'temperature' = data$Temperature), 
  inits=inits) ## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 23
##    Unobserved stochastic nodes: 4
##    Total graph size: 111
## 
## Initializing modelsamples <- 
  coda.samples(model,
               c('slope', 'intercept', 'predictedPr', 'predictedBinary'), # These are the variables we want to monitor (plot, etc)
               100000) # number of MCMC iterations
summary(samples)## 
## Iterations = 1001:101000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                    Mean      SD  Naive SE Time-series SE
## intercept       19.8246 9.25184 0.0292569       0.803987
## predictedBinary  0.9930 0.08361 0.0002644       0.001392
## predictedPr      0.9925 0.04013 0.0001269       0.001341
## slope           -0.3032 0.13592 0.0004298       0.011746
## 
## 2. Quantiles for each variable:
## 
##                    2.5%     25%     50%     75%    97.5%
## intercept        5.6143 13.1947 18.4525 25.1527 41.85570
## predictedBinary  1.0000  1.0000  1.0000  1.0000  1.00000
## predictedPr      0.9329  0.9989  0.9999  1.0000  1.00000
## slope           -0.6264 -0.3816 -0.2830 -0.2057 -0.09434plot(samples)print(HPDinterval(samples))## [[1]]
##                      lower       upper
## intercept        4.0054009 38.42484799
## predictedBinary  1.0000000  1.00000000
## predictedPr      0.9723887  1.00000000
## slope           -0.5749028 -0.06893211
## attr(,"Probability")
## [1] 0.95# "Jailbreaking" from coda into a saner data frame
chain <- samples[[1]]
intercept <- chain[,1]
slope <- chain[,4]
df <- data.frame("intercept" = as.numeric(intercept), "slope" = as.numeric(slope))
summary(df)##    intercept          slope         
##  Min.   :-2.853   Min.   :-0.98311  
##  1st Qu.:13.195   1st Qu.:-0.38156  
##  Median :18.452   Median :-0.28303  
##  Mean   :19.825   Mean   :-0.30323  
##  3rd Qu.:25.153   3rd Qu.:-0.20573  
##  Max.   :66.041   Max.   : 0.02563ggplot(df, aes(intercept, slope)) +
  geom_density_2d()Some things to pay particular attention to:
In its early days, Google used a method in the spirit of what is described below to tweak their website. Let us say that they want to decide if they should use a white background or a black background (or some other cosmetic feature).
After a short interval of time, they record the following results:
Think about the following questions:
require("rjags")
require("coda")
require("ggplot2")
modelstring="
  model {
  
    pA ~ dunif(0, 1)
    pB ~ dunif(0, 1)
  
    sA ~ dbin(pA, nA)
    sB ~ dbin(pB, nB)
    
    AisBetterThanB <- ifelse(pA > pB, 1, 0)
  }
"
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is 
# deterministic given that initial seed.
my.seed <- 1 
set.seed(my.seed) 
inits <- list(.RNG.name = "base::Mersenne-Twister",
              .RNG.seed = my.seed)
model <- jags.model(
  textConnection(modelstring), 
  data = list( # Pass on the data:
    'nA' = 51,
    'sA' = 45, 
    'nB' = 47,
    'sB' = 37), 
  inits=inits) ## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 2
##    Unobserved stochastic nodes: 2
##    Total graph size: 16
## 
## Initializing modelsamples <- 
  coda.samples(model,
               c('pA', 'pB', 'AisBetterThanB'), # These are the variables we want to monitor (plot, etc)
               10000) # number of MCMC iterations
summary(samples)## 
## Iterations = 1:10000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean      SD  Naive SE Time-series SE
## AisBetterThanB 0.8904 0.31241 0.0031241      0.0032970
## pA             0.8673 0.04641 0.0004641      0.0004566
## pB             0.7762 0.05888 0.0005888      0.0005689
## 
## 2. Quantiles for each variable:
## 
##                  2.5%    25%    50%    75%  97.5%
## AisBetterThanB 0.0000 1.0000 1.0000 1.0000 1.0000
## pA             0.7620 0.8394 0.8720 0.9007 0.9439
## pB             0.6529 0.7388 0.7799 0.8179 0.8803plot(samples)print(HPDinterval(samples))## [[1]]
##                    lower     upper
## AisBetterThanB 0.0000000 1.0000000
## pA             0.7743609 0.9493196
## pB             0.6578581 0.8831351
## attr(,"Probability")
## [1] 0.95chain <- samples[[1]]
pA <- chain[,2]
pB <- chain[,3]
df <- data.frame("pA" = as.numeric(pA), "pB" = as.numeric(pB))
summary(df)##        pA               pB        
##  Min.   :0.5934   Min.   :0.5235  
##  1st Qu.:0.8394   1st Qu.:0.7388  
##  Median :0.8720   Median :0.7799  
##  Mean   :0.8673   Mean   :0.7762  
##  3rd Qu.:0.9007   3rd Qu.:0.8179  
##  Max.   :0.9788   Max.   :0.9465ggplot(df, aes(pA, pB)) +
  xlim(0, 1) +
  ylim(0, 1) +
  geom_abline(intercept = 0, slope = 1) +
  geom_point(size = 0.3, alpha = 0.3)ggplot(df, aes(pA, pB)) +
  xlim(0, 1) +
  ylim(0, 1) +
  geom_abline(intercept = 0, slope = 1) +
  geom_density_2d()\[\frac{\text{# points in lower triangle}}{M} \approx {\mathbb{P}}(p_A > p_B | \text{data}) = \int \int_{x > y} f_{p_A, p_B|\text{data}}(x, y) {\text{d}}x {\text{d}}y\]
Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
End of lecture