Alexandre Bouchard-Côté
3/11/2019
Today:
Updated schedule with some readings
Exercise 2:
Optional (Q2): implement a JAGS model for the change point data (last few slides today)
Theorem: if \((X_1, X_2, \dots)\) are exchangeable Bernoulli random variables, then there exists a \(\theta\), a prior on \(\theta\), and a likelihood such that the observations are iid given \(\theta\).
JAGS and related PPLs are very different than other languages you are used to:
Syntax almost the same as mathematical notation used in the lecture:
Z ~ dcat(hyperparameters)
p <- Z / 10
where we will set hyperparameters
to the vector of length 9, \((1/9, 1/9, \dots, 1/9)\)
Some quick references for convenience (see the tutorials for more context and information). The notation in JAGS is fairly similar to standard mathematical notation, but with some slight differences.
dbern(p)
dbin(p, n)
dcat(p)
dpois(lambda)
dnegbin(p, 1)
support is \(0, 1, 2, \dots\)dnegbin(p, r)
support is \(0, 1, 2, \dots\)X ~ dunif(a, b)
X ~ dnorm(mu,1/sigma^2)
uses the inverse of the variance, a parameter called the precisionX ~ dexp(lambda)
rate parameterizationX ~ dgamma(alpha, lambda)
shape-rate parameterizationabs(x)
exp(x)
ifelse(x,a,b)
very useful to build complex models!log(x)
sqrt(x)
max(x, y)
works on vectors and/or more than two argumentsmin(x, y)
sum(x)
sums the elements in the vector xTo get detailed information:
Context:
Data:
data <- read.csv("challenger_data-train.csv")
knitr::kable(data, floating.environment="sidewaystable")
Date | Temperature | Damage.Incident |
---|---|---|
04/12/1981 | 66 | 0 |
11/12/1981 | 70 | 1 |
3/22/82 | 69 | 0 |
6/27/82 | 80 | NA |
01/11/1982 | 68 | 0 |
04/04/1983 | 67 | 0 |
6/18/83 | 72 | 0 |
8/30/83 | 73 | 0 |
11/28/83 | 70 | 0 |
02/03/1984 | 57 | 1 |
04/06/1984 | 63 | 1 |
8/30/84 | 70 | 1 |
10/05/1984 | 78 | 0 |
11/08/1984 | 67 | 0 |
1/24/85 | 53 | 1 |
04/12/1985 | 67 | 0 |
4/29/85 | 75 | 0 |
6/17/85 | 70 | 0 |
7/29/85 | 81 | 0 |
8/27/85 | 76 | 0 |
10/03/1985 | 79 | 0 |
10/30/85 | 75 | 1 |
11/26/85 | 76 | 0 |
01/12/1986 | 58 | 1 |
Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
Model: logistic regression. Input: temperature. Output: failure indicator (i.e. binary) variable.
Questions:
Approaches:
model {
slope ~ dnorm(0, 0.001)
intercept ~ dnorm(0, 0.001)
for (i in 1:length(temperature)) {
# incident[i] ~ dbern(1.0 / (1.0 + exp(- intercept - slope * temperature[i])))
p[i] <- ilogit(intercept + slope * temperature[i])
incident[i] ~ dbern(p[i])
}
# predictedPr <- 1.0 / (1.0 + exp(- intercept - slope * 31))
predictedPr <- ilogit(intercept + slope * 31)
predictedBinary ~ dbern(predictedPr)
}
require("rjags")
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.2.0
## Loaded modules: basemod,bugs
require("coda")
require("ggplot2")
data <- read.csv( file = "challenger_data-train.csv", header = TRUE)
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is
# deterministic given that initial seed.
my.seed <- 1
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
'model.bugs',
data = list( # Pass on the data:
'incident' = data$Damage.Incident,
'temperature' = data$Temperature),
inits=inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 23
## Unobserved stochastic nodes: 4
## Total graph size: 111
##
## Initializing model
samples <-
coda.samples(model,
c('slope', 'intercept', 'predictedPr', 'predictedBinary'), # These are the variables we want to monitor (plot, etc)
100000) # number of MCMC iterations
summary(samples)
##
## Iterations = 1001:101000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1e+05
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## intercept 19.8246 9.25184 0.0292569 0.803987
## predictedBinary 0.9930 0.08361 0.0002644 0.001392
## predictedPr 0.9925 0.04013 0.0001269 0.001341
## slope -0.3032 0.13592 0.0004298 0.011746
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## intercept 5.6143 13.1947 18.4525 25.1527 41.85570
## predictedBinary 1.0000 1.0000 1.0000 1.0000 1.00000
## predictedPr 0.9329 0.9989 0.9999 1.0000 1.00000
## slope -0.6264 -0.3816 -0.2830 -0.2057 -0.09434
plot(samples)
print(HPDinterval(samples))
## [[1]]
## lower upper
## intercept 4.0054009 38.42484799
## predictedBinary 1.0000000 1.00000000
## predictedPr 0.9723887 1.00000000
## slope -0.5749028 -0.06893211
## attr(,"Probability")
## [1] 0.95
# "Jailbreaking" from coda into a saner data frame
chain <- samples[[1]]
intercept <- chain[,1]
slope <- chain[,4]
df <- data.frame("intercept" = as.numeric(intercept), "slope" = as.numeric(slope))
summary(df)
## intercept slope
## Min. :-2.853 Min. :-0.98311
## 1st Qu.:13.195 1st Qu.:-0.38156
## Median :18.452 Median :-0.28303
## Mean :19.825 Mean :-0.30323
## 3rd Qu.:25.153 3rd Qu.:-0.20573
## Max. :66.041 Max. : 0.02563
ggplot(df, aes(intercept, slope)) +
geom_density_2d()
Some things to pay particular attention to:
In its early days, Google used a method in the spirit of what is described below to tweak their website. Let us say that they want to decide if they should use a white background or a black background (or some other cosmetic feature).
After a short interval of time, they record the following results:
Think about the following questions:
require("rjags")
require("coda")
require("ggplot2")
modelstring="
model {
pA ~ dunif(0, 1)
pB ~ dunif(0, 1)
sA ~ dbin(pA, nA)
sB ~ dbin(pB, nB)
AisBetterThanB <- ifelse(pA > pB, 1, 0)
}
"
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is
# deterministic given that initial seed.
my.seed <- 1
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
textConnection(modelstring),
data = list( # Pass on the data:
'nA' = 51,
'sA' = 45,
'nB' = 47,
'sB' = 37),
inits=inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 2
## Unobserved stochastic nodes: 2
## Total graph size: 16
##
## Initializing model
samples <-
coda.samples(model,
c('pA', 'pB', 'AisBetterThanB'), # These are the variables we want to monitor (plot, etc)
10000) # number of MCMC iterations
summary(samples)
##
## Iterations = 1:10000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## AisBetterThanB 0.8904 0.31241 0.0031241 0.0032970
## pA 0.8673 0.04641 0.0004641 0.0004566
## pB 0.7762 0.05888 0.0005888 0.0005689
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## AisBetterThanB 0.0000 1.0000 1.0000 1.0000 1.0000
## pA 0.7620 0.8394 0.8720 0.9007 0.9439
## pB 0.6529 0.7388 0.7799 0.8179 0.8803
plot(samples)
print(HPDinterval(samples))
## [[1]]
## lower upper
## AisBetterThanB 0.0000000 1.0000000
## pA 0.7743609 0.9493196
## pB 0.6578581 0.8831351
## attr(,"Probability")
## [1] 0.95
chain <- samples[[1]]
pA <- chain[,2]
pB <- chain[,3]
df <- data.frame("pA" = as.numeric(pA), "pB" = as.numeric(pB))
summary(df)
## pA pB
## Min. :0.5934 Min. :0.5235
## 1st Qu.:0.8394 1st Qu.:0.7388
## Median :0.8720 Median :0.7799
## Mean :0.8673 Mean :0.7762
## 3rd Qu.:0.9007 3rd Qu.:0.8179
## Max. :0.9788 Max. :0.9465
ggplot(df, aes(pA, pB)) +
xlim(0, 1) +
ylim(0, 1) +
geom_abline(intercept = 0, slope = 1) +
geom_point(size = 0.3, alpha = 0.3)
ggplot(df, aes(pA, pB)) +
xlim(0, 1) +
ylim(0, 1) +
geom_abline(intercept = 0, slope = 1) +
geom_density_2d()
\[\frac{\text{# points in lower triangle}}{M} \approx {\mathbb{P}}(p_A > p_B | \text{data}) = \int \int_{x > y} f_{p_A, p_B|\text{data}}(x, y) {\text{d}}x {\text{d}}y\]
Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
End of lecture