Alexandre Bouchard-Côté
3/13/2019
Today:
Updated schedule with some readings
Recall: Exercise 2 due on Monday
Optional (Q2): implement a JAGS model for the change point data (last few slides today)
Approaches:
model {
slope ~ dnorm(0, 0.001)
intercept ~ dnorm(0, 0.001)
for (i in 1:length(temperature)) {
# incident[i] ~ dbern(1.0 / (1.0 + exp(- intercept - slope * temperature[i])))
p[i] <- ilogit(intercept + slope * temperature[i])
incident[i] ~ dbern(p[i])
}
# predictedPr <- 1.0 / (1.0 + exp(- intercept - slope * 31))
predictedPr <- ilogit(intercept + slope * 31)
predictedBinary ~ dbern(predictedPr)
}
require("rjags")
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.2.0
## Loaded modules: basemod,bugs
require("coda")
require("ggplot2")
data <- read.csv( file = "challenger_data-train.csv", header = TRUE)
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is
# deterministic given that initial seed.
my.seed <- 1
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
'model.bugs',
data = list( # Pass on the data:
'incident' = data$Damage.Incident,
'temperature' = data$Temperature),
inits=inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 23
## Unobserved stochastic nodes: 4
## Total graph size: 111
##
## Initializing model
samples <-
coda.samples(model,
c('slope', 'intercept', 'predictedPr', 'predictedBinary'), # These are the variables we want to monitor (plot, etc)
100000) # number of MCMC iterations
summary(samples)
##
## Iterations = 1001:101000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1e+05
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## intercept 19.8246 9.25184 0.0292569 0.803987
## predictedBinary 0.9930 0.08361 0.0002644 0.001392
## predictedPr 0.9925 0.04013 0.0001269 0.001341
## slope -0.3032 0.13592 0.0004298 0.011746
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## intercept 5.6143 13.1947 18.4525 25.1527 41.85570
## predictedBinary 1.0000 1.0000 1.0000 1.0000 1.00000
## predictedPr 0.9329 0.9989 0.9999 1.0000 1.00000
## slope -0.6264 -0.3816 -0.2830 -0.2057 -0.09434
plot(samples)
print(HPDinterval(samples))
## [[1]]
## lower upper
## intercept 4.0054009 38.42484799
## predictedBinary 1.0000000 1.00000000
## predictedPr 0.9723887 1.00000000
## slope -0.5749028 -0.06893211
## attr(,"Probability")
## [1] 0.95
# "Jailbreaking" from coda into a saner data frame
chain <- samples[[1]]
intercept <- chain[,1]
slope <- chain[,4]
df <- data.frame("intercept" = as.numeric(intercept), "slope" = as.numeric(slope))
summary(df)
## intercept slope
## Min. :-2.853 Min. :-0.98311
## 1st Qu.:13.195 1st Qu.:-0.38156
## Median :18.452 Median :-0.28303
## Mean :19.825 Mean :-0.30323
## 3rd Qu.:25.153 3rd Qu.:-0.20573
## Max. :66.041 Max. : 0.02563
ggplot(df, aes(intercept, slope)) +
geom_density_2d()
Some things to pay particular attention to:
In its early days, Google used a method in the spirit of what is described below to tweak their website. Let us say that they want to decide if they should use a white background or a black background (or some other cosmetic feature).
After a short interval of time, they record the following results:
Think about the following questions:
require("rjags")
require("coda")
require("ggplot2")
modelstring="
model {
pA ~ dunif(0, 1)
pB ~ dunif(0, 1)
sA ~ dbin(pA, nA)
sB ~ dbin(pB, nB)
AisBetterThanB <- ifelse(pA > pB, 1, 0)
}
"
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is
# deterministic given that initial seed.
my.seed <- 1
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
textConnection(modelstring),
data = list( # Pass on the data:
'nA' = 51,
'sA' = 45,
'nB' = 47,
'sB' = 37),
inits=inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 2
## Unobserved stochastic nodes: 2
## Total graph size: 16
##
## Initializing model
samples <-
coda.samples(model,
c('pA', 'pB', 'AisBetterThanB'), # These are the variables we want to monitor (plot, etc)
10000) # number of MCMC iterations
summary(samples)
##
## Iterations = 1:10000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## AisBetterThanB 0.8904 0.31241 0.0031241 0.0032970
## pA 0.8673 0.04641 0.0004641 0.0004566
## pB 0.7762 0.05888 0.0005888 0.0005689
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## AisBetterThanB 0.0000 1.0000 1.0000 1.0000 1.0000
## pA 0.7620 0.8394 0.8720 0.9007 0.9439
## pB 0.6529 0.7388 0.7799 0.8179 0.8803
plot(samples)
print(HPDinterval(samples))
## [[1]]
## lower upper
## AisBetterThanB 0.0000000 1.0000000
## pA 0.7743609 0.9493196
## pB 0.6578581 0.8831351
## attr(,"Probability")
## [1] 0.95
chain <- samples[[1]]
pA <- chain[,2]
pB <- chain[,3]
df <- data.frame("pA" = as.numeric(pA), "pB" = as.numeric(pB))
summary(df)
## pA pB
## Min. :0.5934 Min. :0.5235
## 1st Qu.:0.8394 1st Qu.:0.7388
## Median :0.8720 Median :0.7799
## Mean :0.8673 Mean :0.7762
## 3rd Qu.:0.9007 3rd Qu.:0.8179
## Max. :0.9788 Max. :0.9465
ggplot(df, aes(pA, pB)) +
xlim(0, 1) +
ylim(0, 1) +
geom_abline(intercept = 0, slope = 1) +
geom_point(size = 0.3, alpha = 0.3)
ggplot(df, aes(pA, pB)) +
xlim(0, 1) +
ylim(0, 1) +
geom_abline(intercept = 0, slope = 1) +
geom_density_2d()
\[\frac{\text{# points in lower triangle}}{M} \approx {\mathbb{P}}(p_A > p_B | \text{data}) = \int \int_{x > y} f_{p_A, p_B|\text{data}}(x, y) {\text{d}}x {\text{d}}y\]
Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb
Motivation:
single_rocket_spec <-
"model {
# The likelihood
X ~ dbin(p,n)
# Our prior
p ~ dbeta(a,b)
}"
Key idea: use “side data”" to inform the prior
data <- read.csv("failure_counts.csv")
data %>%
head() %>%
knitr::kable(floating.environment="sidewaystable")
LV.Type | numberOfLaunches | numberOfFailures |
---|---|---|
Aerobee | 1 | 0 |
Angara A5 | 1 | 0 |
Antares 110 | 2 | 0 |
Antares 120 | 2 | 0 |
Antares 130 | 1 | 1 |
Antares 230 | 1 | 0 |
First try
LV.Type | numberOfLaunches | numberOfFailures |
---|---|---|
Aerobee | 1 | 0 |
Angara A5 | 1 | 0 |
Antares 110 | 2 | 0 |
Antares 120 | 2 | 0 |
Antares 130 | 1 | 1 |
Antares 230 | 1 | 0 |
counts <- read.csv("failure_counts.csv")
ggplot(counts, aes(x = numberOfFailures / numberOfLaunches)) +
geom_histogram()
Recall:
In our case: just make \(\mu\) and \(s\) random! (or equivalently, \(\alpha\) and \(\beta\))
It seems we have introduced new problems as now we again have hyperparameters, namely those for the priors on \(\mu\) and \(s\). Here we picked \(\mu \sim {\text{Beta}}(1,1) = {\text{Unif}}(0, 1)\), \(s \sim \text{Exponential}(1/10000)\)
Key point: yes, but now we are less sensitive to these choices!
Why? Heuristic: say you have a random variable connected to some hyper-parameters (grey squares) and random variables connected to data (circles)
Before going hierarchical: for maiden/early flights we had
After going hierarchical:
CC
: Cape Canaveral Launch (american); NIIP
Scientific Research Test Range, Russian: Nauchno-Issledovatel’skii Ispytatel’nyi Poligonfull <- read.csv("processed.csv")
[1] " X X..Launch Launch.Date..UTC. COSPAR PL.Name Orig.PL.Name SATCAT LV.Type LV.S.N Site Suc Ref Suc_bin Family Space.Port Year Launch.Index"
[2] "----- ----------- --------------------- ------------- ----------------------------- -------------------------- ------- ---------------------- ---------------- -------------------------------- ---- --------------------- -------- --------------- ----------- ----- -------------"
[3] " 1 1957 ALP 1957 Oct 4 1928:34 1957 ALP 2 1-y ISZ PS-1 S00002 Sputnik 8K71PS M1-PS NIIP-5 LC1 S Energiya 1 Sputnik NIIP 1957 1"
[4] " 2 1957-U01 1957 Oct 17 0505 1957-U01 USAF 88 Charge A Poulter Pellet A08258 Aerobee USAF 88 HADC A S EngSci1.58 1 Aerobee HADC 1957 1"
[5] " 3 1957 BET 1957 Nov 3 0230:42 1957 BET 1 2-y ISZ PS-2 S00003 Sputnik 8K71PS M1-2PS NIIP-5 LC1 S Grahn-WWW 1 Sputnik NIIP 1957 2"
[6] " 4 1957-F01 1957 Dec 6 1644:35 1957-F01 Vanguard Vanguard Test Satellite F00002 Vanguard TV-3 CC LC18A F Vang-ER9948 0 Vanguard CC 1957 1"
Data looks like this:
Can you build a Bayesian model for that?
By type:
By interpretation:
random
instead of param
, and initialize it with ?: latentReal
intead of fixedReal(3.5)
Key technique used by JAGS: slice sampling.
Why should we look under the hood? To understand the failure modes and how to address them.
Things that the slice sampler has problem with (failure modes):
Solutions to these problems:
Stan
; only works for continuous modelsBlang
; works for continuous and discreteWe have done two thought experiments so far:
model Unidentifiable {
...
laws {
p ~ ContinuousUniform(0, 1)
p2 ~ ContinuousUniform(0, 1)
nFails | p, p2, nLaunches ~ Binomial(nLaunches, p * p2)
}
}
Unidentifiability creates computational difficulties.
Example:
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
require(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
require(rjags)
require(coda)
require(ggplot2)
modelstring="
model {
p1 ~ dunif(0, 1)
p2 ~ dunif(0, 1)
x ~ dbin(p1 * p2, n)
}
"
# Make the simulations reproducible
# 1 is arbitrary, but the rest of the simulation is
# deterministic given that initial seed.
plots <- list()
i <- 1
for (n in c(10,10000)) {local({
my.seed <- 1
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
textConnection(modelstring),
data = list( # Pass on the data:
'n' = n,
'x' = n/2),
inits=inits)
samples <-
coda.samples(model,
c('p1', 'p2'), # These are the variables we want to monitor (plot, etc)
100) # number of MCMC iterations
chain <- samples[[1]]
p1 <- chain[,1]
p2 <- chain[,2]
df <- data.frame("p1" = as.numeric(p1), "p2" = as.numeric(p2))
p <- ggplot(df, aes(p1, p2)) +
xlim(0, 1) +
ylim(0, 1) +
geom_path()
plots[[i]] <<- p
})
i <- i + 1
}
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 2
## Total graph size: 9
##
## Initializing model
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 1
## Unobserved stochastic nodes: 2
## Total graph size: 9
##
## Initializing model
multiplot(plotlist = plots)
## Loading required package: grid
N <- 100000
components <- sample(1:3,prob=c(0.3,0.5,0.2),size=N,replace=TRUE)
mus <- c(0,10,3)
sds <- sqrt(c(1,1,0.1))
samples <- rnorm(n=N,mean=mus[components],sd=sds[components])
data <- data.frame(samples)
ggplot(data, aes(samples)) + geom_density() + theme_bw()
require(rjags)
require(ggplot2)
require(coda)
require(ggmcmc) # Note: nice package converting JAGS output into tidy format + ggplot-based trace/posterior histograms/etc
## Loading required package: ggmcmc
modelstring="
model {
for (k in 1:2) {
mu[k] ~ dnorm(0, .0001)
sigma[k] ~ dunif(0, 100)
alpha[k] <- 1
}
pi ~ ddirch(alpha)
for (n in 1:length(observation)) {
z[n] ~ dcat(pi)
observation[n] ~ dnorm(mu[z[n]], sigma[z[n]])
}
}
"
data <- read.csv( file = "mixture_data_with_header.csv", header = TRUE)
# Make the simulations reproducible
my.seed <- 1 # 1 is arbitrary. The rest of the simulation is deterministic given that initial seed.
set.seed(my.seed)
inits <- list(.RNG.name = "base::Mersenne-Twister",
.RNG.seed = my.seed)
model <- jags.model(
textConnection(modelstring),
data = data,
inits = inits)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 300
## Unobserved stochastic nodes: 305
## Total graph size: 1216
##
## Initializing model
samples <-
coda.samples(model,
c('mu'), # These are the variables we want to monitor (plot, compute expectations, etc)
10000) # number of MCMC iterations
tidy <- ggs(samples)
ggs_traceplot(tidy) + ylim(100,210) + theme_bw()
Question: how to design a sampler for a uniform distribution on a set \(A\), assuming only pointwise evaluation?
Recall: pointwise evaluation means all you can do is: “given \(x\), answer if \(x \in A\)”.
Example: \(A = \{0, 1, 2\}\)
Idea: use a random walk so that it scales to complicated/high dimensional \(A\)
Move to one neighbor at random.
compute_stationary_distribution <- function(matrix) {
eig <- eigen(t(M))
eigenvalues = eig$values
i <- which.max(Re(eigenvalues[abs(Im(eigenvalues)) < 1e-6]))
statio <- (eig$vectors)[,i]
return(statio)
}
M = matrix(c(
0, 1, 0,
1/2, 0, 1/2,
0, 1, 0
), nrow=3, ncol=3, byrow=T)
barplot(compute_stationary_distribution(M))
M = matrix(c(
1/2, 1/2, 0,
1/2, 0, 1/2,
0, 1/2, 1/2
), nrow=3, ncol=3, byrow=T)
barplot(compute_stationary_distribution(M))
Note: the fixed point equation is the same as the equation defining eigenvectors and eigenvalues (up to transpose). This explains how I implemented the function compute_stationary_distribution
in the earlier R code:
print(compute_stationary_distribution)
## function(matrix) {
## eig <- eigen(t(M))
## eigenvalues = eig$values
## i <- which.max(Re(eigenvalues[abs(Im(eigenvalues)) < 1e-6]))
## statio <- (eig$vectors)[,i]
## return(statio)
## }