GAVIN SHADDICK'S WEB PAGE AT UBC

This webpage will provide information, notes and code for the series of workshops given by Gavin Shaddick and held at the department of Statistics, University of British Columbia.

Contact: gavin@stat.ubc.ca

The workshops are:

   1. A series of four BRG tutorials on spatial epidemiology.
   2. An all day workshop on WinBUGS.    CLICK HERE FOR WORKSHOP WEBPAGE
   3. Two extended seminar/tutorials on the use of Bayesian hierarchical models.

NEWS:

29/09/08: Deadline for registration for the WinBUGS workshop is now closed and an email has been sent to all participants regarding payment. If you registered but did not receive this email, please contact elaine@stat.ubc.ca. Further details will be sent around next week. There will be a separate website for this course, which can be found here.

26/10/08: Notes, code and data for the workshops in Bayesian hierarchical modelling are now available and can be found below. Hopefully we will get some time to actually run some of the models (in WinBUGS), so please feel free to bring along your laptop.

SECTION 1: BIOSTATISTICS RESEARCH SEMINAR

DATES: Thursdays - 25 September, 2, 9 and 16 October 2008 (4 sessions)
PLACE: Room 301, Leonard S. Klinck Building, 6356 Agricultural Road, UBC
TIME: 4:00 p.m.
SPEAKER: Dr. Gavin Shaddick, University of Bath, UK
TITLE: Introduction to Spatial Epidemiology

ABSTRACT

Epidemiology is the study of the distribution, causes and control of diseases in human populations. The risk of a disease may be determined by many factors, but these can largely be categorised as factors specific to individuals and those relating to time and space. Here we consider the third of these, in terms of location or place and in this sense space is a surrogate for exposures present at a specific location, e.g. environmental exposures in water/air/soil, or the lifestyle characteristics of those living in particular areas.

In this series of four tutorial style seminars a general introduction to spatial epidemiology will be presented together with an overview of a number of topics within the area, such as disease mapping, spatial regression and cluster detection as time permits. We concentrate on the topic of disease mapping, in which we aim to provide information on a measure of disease occurrence across space. Often the results from such studies are presented in the form of maps of disease occurrence after the spatial dependence in the data has been exploited in order to produce ‘smoothed’ rates and provide better predictions.

Throughout the series, examples from epidemiological studies will be used to illustrate ideas with implementation using R. A brief introduction to the use of WinBUGS when using Bayesian methods will also be given. Familiarity with the basic principles of generalized linear regression models will be assumed together with familiarity with common probability distributions, e.g. normal, binomial, Poisson. A basic understanding of the principles of Bayesian analysis would be desirable, although to this end preliminary reading can suggested.

Thursday, 25 September 2008
Session 1: Introduction to spatial epidemiology.

In this first session, an overview of spatial epidemiological studies will be given, concentrating on real examples from the literature. The basic concepts of epidemiological research will be introduced, including definitions or disease occurrence, types of observational studies, confounding, standardisation and the use of geographical information systems (GIS).

Notes:

A very basic introduction to epidemiology (powerpoint file)

A review of epidemiology (pdf file),
Brief version (slides, pdf file)

Thursday, 2 October 2008
Session 2: Statistical overview

In this second session, we give a statistical overview and put many of the concepts introduced in the first session into the form of statistical regression models. Estimation of parameters is discussed, primarily using the Poisson case as an example, both in terms of likelihood, quasi-likelihood and Bayesian methods. Examples will be given, with implementation in R.

Notes:
Spatial epidemiology notes, part 1 (pdf file)
Exercises 1 (pdf file)
Solutions 1 (pdf file)

Data:
scotdat.txt
ex1q1.dat
jan31.csv (data for question 3)
ex1q3.R (R code to read in the data for question 3)

Code:

ex1q1key.R (for question 1)
ex1q2key.R (for question 2)
ex1q3key.R (for question 3)

Thursday, 9 October 2008
Session 3: Disease mapping I

Disease mapping has a long history in epidemiology, and may be defined as the estimation and presentation of summary measures of health outcomes. The aims of disease mapping include simple description, hypothesis generation, allocation of health care resources, assessment of inequalities and the estimation of background variability in underlying risk in order to place epidemiological studies into context. In this session, we start by considering the instability of estimates of relative risks, especially when dealing with small numbers. We then develop methods for addressing this issue in the non-spatial context, using real life examples implemented in R.

Notes:
Spatial epidemiology notes, part 2 (pdf file)
Exercises 2 (pdf file)

Data:

ohio.dat (text file)
ohio.readme (description of Ohio dat - text file)
SMRSplusmap.txt (text file)

Code:
R functions (text file)
poly.R (text file)

Thursday, 16 October 2008
Session 4: Disease mapping II

In this session, we develop the methods for dealing with unstable estimates of risk to incorporate a spatial component. The basic idea is that we might expect risks in areas that are close’ together to be more similar than those which are not ‘close’. We want to exploit this information in order to provide more reliable relative risk estimates in each area.

SECTION 2: AN INTRODUCTION TO WinBUGS

CLICK HERE FOR WORKSHOP HOMEPAGE

Instructor: Dr Gavin Shaddick
Date: Saturday, October 18th, 2008
Location: Department of Statistics, University of British Columbia.

This workshop is aimed at statisticians, data analysts and
quantitative researchers who are interested in using WinBUGS to
perform Bayesian analysis. WinBUGS is a powerful tool that allows the user to perform
Markov chain Monte Carlo (MCMC). The day will be split into a series of
lectures and practicals, the latter with hands-on data analysis.
Details and assistance on how to download and install the WinBUGS
software will be provided. Participants are encouraged to bring their
own laptops.

No previous experience of Bayesian methods or WinBUGS is necessary,
although familiarity with the basic principles of generalised linear
regression models will be assumed together with familiarity with
common probability distributions, e.g. normal, binomial, Poisson.

Laptops and Software download: Participants are asked to bring their
own laptops (running Windows) for the practicals if possible. We ask that you work in
pairs during the practical classes, so if you do not have a laptop we
can always pair you up with someone that does. The room also has
wireless internet access.

Notes and datasets: Notes for the workshop will be given out
and electronic copies of the code and data used in the
practicals made available for download.

There will be a separate website for this course, which can be found here.

SECTION 3: WORKSHOPS IN BAYESIAN HIERARCHICAL MODELLING

DATE: Tuesday, 28 October 2008, Thursday, 30 October 2008
PLACE: Room 301, Leonard S. Klinck Building, 6356 Agricultural Road, UBC
TIME: 11:00 a.m. (2 hours)
SPEAKER: Dr. Gavin Shaddick, Department of Mathematical Sciences, University of Bath, UK
TITLE: Bayesian spatial-temporal modelling (using WinBUGS)

ABSTRACT

In this series of tutorials, we explore how complex Bayesian hierarchical models can be used in practice and in particular how they can be implemented using the WinBUGS software. This learning module in conjunction with its two predecessors this term (Introduction to spatial epidemiology and the WinBUGS workshop), will provide the skills needed to produce and implement very complex models for phenomena addressed in environmental health risk. To assist in achieving this objective, a project will be suggested, and assessed for those interested in submitting their results for review, as a follow up to this series of learning modules. UBC students who attend all three of these modules and who complete a piece of project work may petition their supervisor to apply for credit under Stat548 (directed studies).

The work will be presented by working through an example, a spatial-temporal model for modelling air pollutants. In conducting studies to investigate the relationship between air pollution and health, it is important to have a good measure of the level of pollution on each of the study days. Often daily measurements are available from a number of monitoring sites across the study area. Each of these monitors may measure different sets of pollutants, there may be periods of missing data, and each of the recorded measurements may be subject to error. Shaddick & Wakefield (2002) proposed a Bayesian hierarchical model for the analysis of such data, in which the dependencies across time, space and pollutants are exploited.

Details of the model can be found in: Shaddick G and Wakefield, JC. Modelling multiple pollutants at multiple sites. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2002, vol 51, no 3, 351-372. doi: 10.1111/1467-9876.00273. The paper can be found here (if on UBC network).

The different components of this model will be explained over the tutorials, including the temporal and spatial structures and the model applied to data collected at eight sites within London, measuring particulate matter (PM10), carbon monoxide (CO), nitrogen oxide (NO) and sulphur dioxide (SO2) over the period 1994-97. The estimates of the underlying levels of pollution can then be used to `map' the pollution field and for subsequent health analysis, with uncertainty in the exposures being incorporated into the precision of the resulting estimates of risk.

Session 1: Introductory seminar

In the first of the four sessions, the model will be presented in the form of an introductory seminar. The abstract is as follows:

Modelling levels of pollution for use in time series studies examining the relationship between air pollution and health

Gavin Shaddick*, Jon Wakefield**

Department of Mathematical Sciences, University of Bath

* Departments of Statistics and Biostatistics, University of Washington

In conducting time series studies to investigate the relationship between air pollution and a health outcome, for example respiratory mortality, it is important to have a good measure of the level of pollution on any particular day. Often daily measurements are available from a number of monitoring sites across the study. Each of these monitors may measure different sets of pollutants, there may be periods of missing data, and all of the recorded measurements will be subject to error. This paper describes the problems of combining such data to produce estimates of the levels of pollution that can be used in modelling the health outcome.

A hierarchical model is used for the analysis, addressing the issues described, and specifically, allows information from multiple sites on different pollutants to be combined. This allows an estimate of the underlying pollution level for each pollutant at each site to be obtained, incorporating any possible lag structure, along with a measure of uncertainty. This is particularly useful for accounting for the variation in the pollution level, whether formally via error-in-variables modelling, or informally when interpreting the regression coefficients describing the relationship between risk and pollution.

These methods are used in assessing the relationship between respiratory mortality and pollution in London for the period 1993-96. A number of pollutants, including PM10, CO, NO and SO2, were measured at five sites in London and the available data used to calculate a daily estimates of the underlying levels of pollution.

Session 2: Single pollutant at a single site.

In this second session, we start by performing some initial data analysis to observe the possible spatial and temporal structure in the data. We then concentrate on setting out the hierarchical model. We start with the simplest case of a single pollutant being measured at a single site, which entails fitting a temporal model allowing for dependencies over time. We see how this model can be fit in WinBUGS using the data on PM10 for London.

Session 3: Single pollutant at multiple sites.

Here we develop the model to incorporate data from multiple monitoring sites, i.e. we introduce a spatial component to the model. This will be done by assuming that the measurements from the different sites follow a multivariate normal distribution with structure in the covariance matrix which reflects the fact that measurements made from sites that are close together are likely to be more similar than those far apart. We also explore how the resulting estimates from the models can be used to predict levels of the pollutant in question at locations where there are no monitoring sites, allowing ‘maps’ of pollution to be produced, with corresponding estimates of uncertainty. Again, we will see how such spatial models can be fit within WinBUGS and use the data on PM10 for London as an example.

Session 4: Multiple pollutants at multiple sites.

Finally, we will combine the temporal and spatial aspects of the model from the previous sessions with a multi-pollutant model which allows a number of pollutants to be modelled simultaneously. The basic premise of this is that the temporal structure is expanded to be multivariate (normal). As the size of the data being used will have now grown considerably (4 pollutants x 4 years of daily measurements x 8 locations), we will discuss the problems of implementing such models within a Bayesian framework. These will include the computational burden of running Markov chain Monte Carlo using WinBUGS on large datasets, especially where there is a spatial structure, and suggest discuss the efficiency of using a selection of different approaches.

Notes:

Part one - introductory seminar material (pdf file)
Part two - implementing the models in WinBUGS (pdf file)

Combined version (as given in the workshop) (pdf file)

Data & code:
model1 (odc file)
model1-data (odc file)
model1-inits1 (odc file)
model1-inits2 (odc file)

model1CARNORMAL (odc.file)

model2 (odc file)
model2-data (odc file)
model2-inits1 (odc file)
model2-inits2 (odc file)