### Tutorial on Bayesian regression (GLMs)

In this tutorial, we will walk through an example of Bayesian regression in action. We will use the famous Challenger dataset to illustrate the concepts.

We start with a classical Generalized Linear Model (GLM), known as logistic regression.

ex.reg.Challenger
package ex.reg model Challenger { random RealVar intercept ?: latentReal random RealVar slope ?: latentReal random IntVar prediction ?: latentInt random List<IntVar> incidents param List<RealVar> temperatures laws { intercept ~ Normal(0.0, 100.0) slope ~ Normal(0.0, 100.0) for (int i : 0 ..< incidents.size) { incidents.get(i) | RealVar temperature = temperatures.get(i), intercept, slope ~ Bernoulli(logistic(intercept + slope * temperature)) } prediction | intercept, slope ~ Bernoulli(logistic(intercept + slope * 31)) } }

Protip: to find more information on useful functions (e.g. logistic): Quick reference for functions

Is the slope parameter really useful? One way to tackle this question is to use a Spike-and-Slab model:

ex.reg.ChallengerSpiked
package ex.reg import blang.types.SpikedRealVar model ChallengerSpiked { random RealVar intercept ?: latentReal random SpikedRealVar slope ?: new SpikedRealVar random IntVar prediction ?: latentInt random List<IntVar> incidents param List<RealVar> temperatures laws { intercept ~ Normal(0.0, 100.0) slope.continuousPart ~ Normal(0.0, 100.0) slope.selected ~ Bernoulli(0.5) for (int i : 0 ..< incidents.size) { incidents.get(i) | RealVar temperature = temperatures.get(i), intercept, slope ~ Bernoulli(logistic(intercept + slope.doubleValue * temperature)) } prediction | intercept, slope ~ Bernoulli(logistic(intercept + slope.doubleValue * 31)) } }

If you are curious, here is how the Spike-and-Slab variable is implemented (a real and an integer is always maintained in memory, this technique is an example of "model saturation" (Carlin and Chib, 1995; or The Bayesian Choice 2nd edition, section 7.3.3).

More broadly, for information on random variable types (e.g. IntVar, RealVar, and how to initialize them), follow this link

Exercise 1

Use the posterior distribution to determine if this model leans towards supporting a constant or linear function in the likelihood.

Let us change the prior a little bit (the only change is the parameter for the prior on the slope):

ex.reg.ChallengerSpiked2
package ex.reg import blang.types.SpikedRealVar model ChallengerSpiked2 { random RealVar intercept ?: latentReal random SpikedRealVar slope ?: new SpikedRealVar random IntVar prediction ?: latentInt random List<IntVar> incidents param List<RealVar> temperatures laws { intercept ~ Normal(0.0, 100.0) slope.continuousPart ~ Normal(0.0, 10.0) slope.selected ~ Bernoulli(0.5) for (int i : 0 ..< incidents.size) { incidents.get(i) | RealVar temperature = temperatures.get(i), intercept, slope ~ Bernoulli(logistic(intercept + slope.doubleValue * temperature)) } prediction | intercept, slope ~ Bernoulli(logistic(intercept + slope.doubleValue * 31)) } }

Exercise 2

Did the result in the last exercise change with this new prior distribution?

Exercise 3

(Open ended) How would you proceed to set the prior distributions in this application example?