Regression

Alexandre Bouchard-Côté

Overview

Moving towards more interesting models: challenger disaster and GLMs

drawing

Context:

Data:

drawing

data <- read.csv("challenger_data-train.csv")
knitr::kable(data, floating.environment="sidewaystable")
Date Temperature Damage.Incident
04/12/1981 66 0
11/12/1981 70 1
3/22/82 69 0
6/27/82 80 NA
01/11/1982 68 0
04/04/1983 67 0
6/18/83 72 0
8/30/83 73 0
11/28/83 70 0
02/03/1984 57 1
04/06/1984 63 1
8/30/84 70 1
10/05/1984 78 0
11/08/1984 67 0
1/24/85 53 1
04/12/1985 67 0
4/29/85 75 0
6/17/85 70 0
7/29/85 81 0
8/27/85 76 0
10/03/1985 79 0
10/30/85 75 1
11/26/85 76 0
01/12/1986 58 1

Example adapted from: http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb

Regression setup

Regression models (Bayesian and non-Bayesian) involve the following quantities:

Bayesian approach

Model: logistic regression. Input: temperature. Output: failure indicator (i.e. binary) variable.

Questions:

Graphical model and Blang implementation

Blang implementation

drawing

Alternative: JAGS implementation

model {

  slope ~ dnorm(0, 0.001)
  intercept ~ dnorm(0, 0.001)

  for (i in 1:length(temperature)) {
    # incident[i] ~ dbern(1.0 / (1.0 + exp(- intercept - slope * temperature[i])))
    p[i] <- ilogit(intercept + slope * temperature[i])
    incident[i] ~ dbern(p[i])
  }

  # predictedPr <- 1.0 / (1.0 + exp(- intercept - slope * 31))
  predictedPr <- ilogit(intercept + slope * 31)
  predictedBinary ~ dbern(predictedPr)
}
require("rjags")
require("coda")
require("ggplot2")

data <- read.csv( file = "challenger_data-train.csv", header = TRUE)

 # Make the simulations reproducible
 # 1 is arbitrary, but the rest of the simulation is 
 # deterministic given that initial seed.
my.seed <- 1 
set.seed(my.seed) 
inits <- list(.RNG.name = "base::Mersenne-Twister",
              .RNG.seed = my.seed)

model <- jags.model(
  'model.bugs', 
  data = list( # Pass on the data:
    'incident' = data$Damage.Incident, 
    'temperature' = data$Temperature), 
  inits=inits) 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 23
##    Unobserved stochastic nodes: 4
##    Total graph size: 111
## 
## Initializing model
samples <- 
  coda.samples(model,
               c('slope', 'intercept', 'predictedPr', 'predictedBinary'), # These are the variables we want to monitor (plot, etc)
               100000) # number of MCMC iterations 100000
summary(samples)
## 
## Iterations = 1001:101000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                    Mean      SD  Naive SE Time-series SE
## intercept       19.8246 9.25184 0.0292569       0.803987
## predictedBinary  0.9930 0.08361 0.0002644       0.001392
## predictedPr      0.9925 0.04013 0.0001269       0.001341
## slope           -0.3032 0.13592 0.0004298       0.011746
## 
## 2. Quantiles for each variable:
## 
##                    2.5%     25%     50%     75%    97.5%
## intercept        5.6143 13.1947 18.4525 25.1527 41.85570
## predictedBinary  1.0000  1.0000  1.0000  1.0000  1.00000
## predictedPr      0.9329  0.9989  0.9999  1.0000  1.00000
## slope           -0.6264 -0.3816 -0.2830 -0.2057 -0.09434
plot(samples)

print(HPDinterval(samples))
## [[1]]
##                      lower       upper
## intercept        4.0054009 38.42484799
## predictedBinary  1.0000000  1.00000000
## predictedPr      0.9723887  1.00000000
## slope           -0.5749028 -0.06893211
## attr(,"Probability")
## [1] 0.95
 # "Jailbreaking" from coda into a saner data frame
chain <- samples[[1]]
intercept <- chain[,1]
slope <- chain[,4]
df <- data.frame("intercept" = as.numeric(intercept), "slope" = as.numeric(slope))
summary(df)
##    intercept          slope         
##  Min.   :-2.853   Min.   :-0.98311  
##  1st Qu.:13.195   1st Qu.:-0.38156  
##  Median :18.452   Median :-0.28303  
##  Mean   :19.825   Mean   :-0.30323  
##  3rd Qu.:25.153   3rd Qu.:-0.20573  
##  Max.   :66.041   Max.   : 0.02563
ggplot(df, aes(intercept, slope)) +
  geom_density_2d()

For a more modern way of using the output of JAGS: http://mjskay.github.io/tidybayes/

Linking to Bayes estimation

Recall our Bayes estimation framework from last week:

\[{\textrm{argmin}}\{{\mathbf{E}}[L(a, Z)|X], a \in {\mathcal{A}}\}\]

Poll: How to cast regression into our Bayes estimation framework?

  1. Any assignments of \(Z\) and \(X\) is possible
  2. \(Z\) contains test outputs; while \(X\) contains all others
  3. \(Z\) contains parameters; while \(X\) contains all others
  4. \(Z\) contains parameters, test outputs; while \(X\) contains training outputs, and all inputs
  5. \(Z\) contains parameters, test outputs; while \(X\) contains training outputs

Shortcut

Prediction

Poll: what is the Bayesian approach to this prediction problem

  1. Compute the posterior mean of the slope and intercept, \(\hat a_\text{mean}\) and \(\hat b_\text{mean}\), and return \(\text{logistic}(31 \hat a_\text{mean}+ \hat b_\text{mean})\)
  2. Compute the MAP of the slope and intercept, \(\hat a_\text{MAP}\) and \(\hat b_\text{MAP}\), and return \(\text{logistic}(31 \hat a_\text{MAP}+ \hat b_\text{MAP})\)
  3. Compute the posterior median of the slope and intercept, \(\hat a_\text{med}\) and \(\hat b_\text{med}\), and return \(\text{logistic}(31 \hat a_\text{med}+ \hat b_\text{med})\)
  4. None of the above

Prediction

Zero-one loss

\[ \begin{aligned} \delta^*(X) &= {\textrm{argmin}}\{ {\mathbf{E}}[L(a, Z) | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbf{E}}[{{\bf 1}}[Y \neq a] | X] : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ {\mathbb{P}}(Y \neq a | X) : a \in {\mathcal{A}}\} \\ &= {\textrm{argmin}}\{ 1 - {\mathbb{P}}(Y = a | X) : a \in {\mathcal{A}}\} \\ &= {\textrm{argmax}}\{ {\mathbb{P}}(Y = a | X) : a \in {\mathcal{A}}\} \\ &= \text{MAP estimator} \end{aligned} \]

So we obtain a MAP estimator, but on the predicted binary variable directly (contrast with the MLE plug-in approach, which maximizes continuous parameters first, then plug-in these estimates to predict)

GLMs: general framework

\[{\text{output}}| {\text{inputs}}, {\text{parameter}}\sim \text{SuitableDistribution}(f({\text{parameter}}, {\text{inputs}}))\]

Sparsity and model selection

Bayes estimator

The Bayes estimator is

\[{\textrm{argmax}}\{{\mathbb{P}}(\text{slope}=a|X) : a \in \{0, 1\}\}\]

Poll: would you use the Bayes estimator to select between the two models?

  1. Yes, this is the optimal approach
  2. No, instead I would look if the credible intervals on the slope contains 0
  3. No, because this would always select the more complex model

Difficulty

\[{\mathbb{P}}(A) = 0 \Longrightarrow {\mathbb{P}}(A | B) = \frac{{\mathbb{P}}(A \cap B)}{{\mathbb{P}}(B)} \le \frac{{\mathbb{P}}(A)}{{\mathbb{P}}(B)} = 0 \]

Solution: change the prior to put positive mass on the event (slope = 0)

Blang implementation

Readings

Start reading about one or two PPLs, e.g.:

JAGS version of Spike-and-Slab

model.name <- "spike-slab.bugs"
{ # Extra bracket needed only for R markdown files
sink(model.name)
cat(" 
model {

  slope ~ dnorm(0, 0.001)
  intercept ~ dnorm(0, 0.001)
  selected ~ dbern(0.5)

  for (i in 1:length(temperature)) {
    # incident[i] ~ dbern(1.0 / (1.0 + exp(- intercept - slope * temperature[i])))
    p[i] <- ilogit(intercept + selected * slope * temperature[i])
    incident[i] ~ dbern(p[i])
  }

  # predictedPr <- 1.0 / (1.0 + exp(- intercept - slope * 31))
  predictedPr <- ilogit(intercept + selected * slope * 31)
  predictedBinary ~ dbern(predictedPr)
}

",fill = TRUE)
sink()
} # Extra bracket needed only for R markdown files

require("rjags")
require("coda")
require("ggplot2")

data <- read.csv( file = "challenger_data-train.csv", header = TRUE)

 # Make the simulations reproducible
 # 1 is arbitrary, but the rest of the simulation is 
 # deterministic given that initial seed.
my.seed <- 1 
set.seed(my.seed) 
inits <- list(.RNG.name = "base::Mersenne-Twister",
              .RNG.seed = my.seed)

model <- jags.model(
  model.name, 
  data = list( # Pass on the data:
    'incident' = data$Damage.Incident, 
    'temperature' = data$Temperature), 
  inits=inits) 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 23
##    Unobserved stochastic nodes: 5
##    Total graph size: 113
## 
## Initializing model
samples <- 
  coda.samples(model,
               c('slope', 'intercept', 'predictedPr', 'predictedBinary'), # These are the variables we want to monitor (plot, etc)
               100000) # number of MCMC iterations 100000
summary(samples)
## 
## Iterations = 1001:101000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                    Mean      SD  Naive SE Time-series SE
## intercept        3.0611  8.8648 0.0280329        1.58688
## predictedBinary  0.4461  0.4971 0.0015719        0.02981
## predictedPr      0.4450  0.2887 0.0009129        0.07922
## slope           -0.0205 28.1810 0.0891162        0.11244
## 
## 2. Quantiles for each variable:
## 
##                     2.5%      25%     50%     75% 97.5%
## intercept        -1.7724  -1.0794 -0.6995 -0.1377 28.65
## predictedBinary   0.0000   0.0000  0.0000  1.0000  1.00
## predictedPr       0.1452   0.2536  0.3319  0.4651  1.00
## slope           -59.1470 -15.1741 -0.2526 15.2250 58.56
plot(samples)