This webpage will provide information,
notes and code for the series of workshops given by Gavin Shaddick and
held at the department of Statistics, University of British Columbia.
Contact:
gavin@stat.ubc.ca
The workshops are:
1. A
series of four BRG
tutorials on spatial epidemiology.
2. An all day workshop on
WinBUGS. CLICK
HERE
FOR WORKSHOP WEBPAGE
3. Two
extended
seminar/tutorials on the use of Bayesian hierarchical models.
NEWS:
29/09/08: Deadline for registration for the WinBUGS workshop is now
closed and an email has been sent to all participants regarding
payment. If you registered but did not receive this email, please
contact elaine@stat.ubc.ca. Further details will be sent around next
week. There will be a separate website for this course, which can be
found here.
26/10/08: Notes, code and data for the workshops in Bayesian
hierarchical modelling are now available and can be found below. Hopefully we will
get some time to actually run some of the models (in WinBUGS), so
please feel free to bring along your laptop.
SECTION 1:
BIOSTATISTICS RESEARCH SEMINAR
DATES: Thursdays - 25 September, 2, 9
and 16 October 2008 (4 sessions)
PLACE: Room 301, Leonard S. Klinck
Building, 6356 Agricultural Road, UBC
TIME: 4:00 p.m.
SPEAKER: Dr. Gavin Shaddick,
University of Bath, UK
TITLE: Introduction to Spatial
Epidemiology
ABSTRACT
Epidemiology is the study of the distribution, causes and control of
diseases in human populations. The risk of a disease may be determined
by many factors, but these can largely be categorised as factors
specific to individuals and those relating to time and space.
Here we consider the third of these, in terms of location or place and
in this sense space is a surrogate for exposures present at a
specific location, e.g. environmental exposures in water/air/soil, or
the lifestyle characteristics of those living in particular areas.
In this series of four tutorial style seminars a general introduction
to spatial epidemiology will be presented together with an overview of
a number of topics within the area, such as disease mapping, spatial
regression and cluster detection as time permits. We concentrate on the
topic of disease mapping, in which we aim to provide information on a
measure of disease occurrence across space. Often the results
from such studies are presented in the form of maps of disease
occurrence after the spatial dependence in the data has been exploited
in order to produce ‘smoothed’ rates and provide better
predictions.
Throughout the series, examples from epidemiological studies will be
used to illustrate ideas with implementation using R. A brief
introduction to the use of WinBUGS when using Bayesian methods will
also be given. Familiarity with the basic principles of generalized
linear regression models will be assumed together with familiarity with
common probability distributions, e.g. normal, binomial, Poisson.
A basic understanding of the principles of Bayesian analysis would be
desirable, although to this end preliminary reading can suggested.
Thursday, 25 September 2008
Session 1: Introduction
to spatial epidemiology.
In this first session, an overview of spatial epidemiological studies
will be given, concentrating on real examples from the
literature. The basic concepts of epidemiological research will
be introduced, including definitions or disease occurrence, types of
observational studies, confounding, standardisation and the use of
geographical information systems (GIS).
Notes:
A very basic introduction to
epidemiology (powerpoint
file)
A review of epidemiology
(pdf file),
Brief version (slides,
pdf file)
Thursday, 2 October 2008
Session 2: Statistical
overview
In this second session, we give a statistical overview and put many of
the concepts introduced in the first session into the form of
statistical regression models. Estimation of parameters is discussed,
primarily using the Poisson case as an example, both in terms of
likelihood, quasi-likelihood and Bayesian methods. Examples will be
given, with implementation in R.
Notes:
Spatial epidemiology notes, part 1 (pdf
file)
Exercises 1
(pdf file)
Solutions 1 (pdf file)
Data:
scotdat.txt
ex1q1.dat
jan31.csv (data for question 3)
ex1q3.R (R code to read in the data
for question 3)
Code:
ex1q1key.R (for question 1)
ex1q2key.R (for question 2)
ex1q3key.R (for question 3)
Thursday, 9 October 2008
Session 3: Disease
mapping I
Disease mapping has a long history in epidemiology, and may be defined
as the estimation and presentation of summary measures of health
outcomes. The aims of disease mapping include simple description,
hypothesis generation, allocation of health care resources, assessment
of inequalities and the estimation of background variability in
underlying risk in order to place epidemiological studies into context.
In this session, we start by considering the instability of estimates
of relative risks, especially when dealing with small numbers. We then
develop methods for addressing this issue in the non-spatial context,
using real life examples implemented in R.
Notes:
Spatial epidemiology notes, part 2 (pdf
file)
Exercises 2
(pdf file)
Data:
ohio.dat (text file)
ohio.readme (description of
Ohio dat - text file)
SMRSplusmap.txt (text file)
Code:
R
functions (text file)
poly.R (text file)
Thursday, 16 October 2008
Session 4: Disease
mapping II
In this session, we develop the methods for dealing with unstable
estimates of risk to incorporate a spatial component. The basic idea is
that we might expect risks in areas that are close’ together to be more
similar than those which are not ‘close’. We want to exploit this
information in order to provide more reliable relative risk estimates
in each area.
SECTION
2: AN INTRODUCTION TO WinBUGS
CLICK HERE FOR WORKSHOP HOMEPAGE
Instructor: Dr Gavin Shaddick
Date: Saturday, October 18th, 2008
Location: Department of Statistics,
University of British Columbia.
This workshop is aimed at statisticians, data analysts and
quantitative researchers who are interested in using WinBUGS to
perform Bayesian analysis. WinBUGS is a powerful tool that allows the
user to perform
Markov chain Monte Carlo (MCMC). The day will be split into a series of
lectures and practicals, the latter with hands-on data analysis.
Details and assistance on how to download and install the WinBUGS
software will be provided. Participants are encouraged to bring their
own laptops.
No previous experience of Bayesian methods or WinBUGS is necessary,
although familiarity with the basic principles of generalised linear
regression models will be assumed together with familiarity with
common probability distributions, e.g. normal, binomial, Poisson.
Laptops and Software download: Participants are asked to bring their
own laptops (running Windows) for the practicals if possible. We ask
that you work in
pairs during the practical classes, so if you do not have a laptop we
can always pair you up with someone that does. The room also has
wireless internet access.
Notes and datasets: Notes for the workshop will be given out
and electronic copies of the code and data used in the
practicals made available for download.
There will be a separate website for this course, which can be found
here.
SECTION
3: WORKSHOPS IN BAYESIAN HIERARCHICAL MODELLING
DATE: Tuesday, 28 October 2008,
Thursday, 30 October 2008
PLACE: Room 301, Leonard S. Klinck
Building, 6356 Agricultural Road, UBC
TIME: 11:00 a.m. (2 hours)
SPEAKER: Dr. Gavin Shaddick,
Department of Mathematical Sciences, University of Bath, UK
TITLE: Bayesian spatial-temporal
modelling (using WinBUGS)
ABSTRACT
In this series of tutorials, we explore how complex Bayesian
hierarchical models can be used in practice and in particular how they
can be implemented using the WinBUGS software. This learning module in
conjunction with its two predecessors this term (Introduction to
spatial epidemiology and the WinBUGS workshop), will provide the skills
needed to produce and implement very complex models for phenomena
addressed in environmental health risk. To assist in achieving this
objective, a project will be suggested, and assessed for those
interested in submitting their results for review, as a follow up to
this series of learning modules. UBC students who attend all three of
these modules and who complete a piece of project work may petition
their supervisor to apply for credit under Stat548 (directed studies).
The work will be presented by working through an example, a
spatial-temporal model for modelling air pollutants. In conducting
studies to investigate the relationship between air pollution and
health, it is important to have a good measure of the level of
pollution on each of the study days. Often daily measurements are
available from a number of monitoring sites across the study area. Each
of these monitors may measure different sets of pollutants, there may
be periods of missing data, and each of the recorded measurements may
be subject to error. Shaddick & Wakefield (2002) proposed a
Bayesian hierarchical model for the analysis of such data, in which the
dependencies across time, space and pollutants are exploited.
Details of the model can be found in: Shaddick G and Wakefield,
JC. Modelling multiple pollutants at multiple sites. Journal of
the Royal Statistical Society: Series C (Applied Statistics), 2002, vol
51, no 3, 351-372. doi: 10.1111/1467-9876.00273. The paper can be
found here
(if on UBC network).
The different components of this model will be explained over the
tutorials, including the temporal and spatial structures and the model
applied to data collected at eight sites within London, measuring
particulate matter (PM10), carbon monoxide (CO), nitrogen oxide (NO)
and sulphur dioxide (SO2) over the period 1994-97. The estimates of the
underlying levels of pollution can then be used to `map' the pollution
field and for subsequent health analysis, with uncertainty in the
exposures being incorporated into the precision of the resulting
estimates of risk.
Session 1: Introductory seminar
In the first of the four sessions, the model will be presented in the
form of an introductory seminar. The abstract is as follows:
Modelling levels of pollution for use in time series studies examining
the relationship between air pollution and health
Gavin Shaddick*, Jon Wakefield**
Department of Mathematical Sciences, University of Bath
* Departments of Statistics and Biostatistics, University of Washington
In conducting time series studies to investigate the relationship
between air pollution and a health outcome, for example respiratory
mortality, it is important to have a good measure of the level of
pollution on any particular day. Often daily measurements are available
from a number of monitoring sites across the study. Each of these
monitors may measure different sets of pollutants, there may be periods
of missing data, and all of the recorded measurements will be subject
to error. This paper describes the problems of combining such data to
produce estimates of the levels of pollution that can be used in
modelling the health outcome.
A hierarchical model is used for the analysis, addressing the issues
described, and specifically, allows information from multiple sites on
different pollutants to be combined. This allows an estimate of the
underlying pollution level for each pollutant at each site to be
obtained, incorporating any possible lag structure, along with a
measure of uncertainty. This is particularly useful for accounting for
the variation in the pollution level, whether formally via
error-in-variables modelling, or informally when interpreting the
regression coefficients describing the relationship between risk and
pollution.
These methods are used in assessing the relationship between
respiratory mortality and pollution in London for the period
1993-96. A number of pollutants, including PM10, CO, NO and SO2,
were measured at five sites in London and the available data used to
calculate a daily estimates of the underlying levels of pollution.
Session 2: Single pollutant at
a single site.
In this second session, we start by performing some initial data
analysis to observe the possible spatial and temporal structure in the
data. We then concentrate on setting out the hierarchical model.
We start with the simplest case of a single pollutant being measured at
a single site, which entails fitting a temporal model allowing for
dependencies over time. We see how this model can be fit in WinBUGS
using the data on PM10 for London.
Session 3: Single pollutant at
multiple sites.
Here we develop the model to incorporate data from multiple monitoring
sites, i.e. we introduce a spatial component to the model. This
will be done by assuming that the measurements from the different sites
follow a multivariate normal distribution with structure in the
covariance matrix which reflects the fact that measurements made from
sites that are close together are likely to be more similar than those
far apart. We also explore how the resulting estimates from the models
can be used to predict levels of the pollutant in question at locations
where there are no monitoring sites, allowing ‘maps’ of pollution to be
produced, with corresponding estimates of uncertainty. Again, we will
see how such spatial models can be fit within WinBUGS and use the data
on PM10 for London as an example.
Session 4: Multiple pollutants
at multiple sites.
Finally, we will combine the temporal and spatial aspects of the model
from the previous sessions with a multi-pollutant model which allows a
number of pollutants to be modelled simultaneously. The basic premise
of this is that the temporal structure is expanded to be multivariate
(normal). As the size of the data being used will have now grown
considerably (4 pollutants x 4 years of daily measurements x 8
locations), we will discuss the problems of implementing such models
within a Bayesian framework. These will include the computational
burden of running Markov chain Monte Carlo using WinBUGS on large
datasets, especially where there is a spatial structure, and suggest
discuss the efficiency of using a selection of different approaches.
Notes:
Part one - introductory seminar
material (pdf
file)
Part two -
implementing the models in WinBUGS (pdf file)
Combined version (as given in the workshop)
(pdf file)
Data & code:
model1 (odc file)
model1-data (odc file)
model1-inits1 (odc file)
model1-inits2 (odc file)
model1CARNORMAL (odc.file)
model2 (odc file)
model2-data (odc file)
model2-inits1 (odc file)
model2-inits2 (odc file)