Seminars

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 11th December 2008
3:30pm
Division of Biostatistics, Indiana University School of Medicine
Impact of the design matrix structure on the performance of LASSO: Empirical study
Show Abstract
High-throughput technologies in medical research provided statisticians with an ever increasing amounts of data.  One of the methodological and practical challenges in the analysis of such data is variable selection in regression models. The past 15 years brought a formidable number of methods dealing with the variable selection in the case when the number of covariates is much larger than the number of observations (p >> n). Majority of the methods fall under a category of penalized likelihood which includes ridge regression, LASSO and its variations, SCAD and Dantzig selector.

In our work, we provide simulation results on the performance of LASSO in the case of strong dependence between the columns of the design matrix X. We consider the estimation error, prediction error and a measure of concordance between the true and selected variables. We study the dependence of the results on the design matrix specification, "irrepresentability condition" of Zhao and Yu (2006) and "phase transition" of Donoho and Stodden (2006). We also compare these results with the more common situation of orthogonality of columns of X.

In the compound symmetry case, we find that the increased dependence between the columns of X results in larger estimation error, but decreased prediction error. In the anisotropic correlation case, both estimation and prediction errors are the largest when the covariates have both positive and negative correlations.



Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 10th December 2008
11:30am
Department of Statistics Athens University of Economics & Business Greece
Treating missing values in discrete valued time series models
Show Abstract
Time series models for count data have found increased interest in in several different disciplines including epidemiology, marketing, criminology, accident analysis etc. The existing literature refers to the case of data that have been fully observed. Series containing missing values have been overlooked in the literature. In the present paper, methods for estimating the parametersin the presence of missing data are proposed.  We propose two different strategies: The first method maximizes a conditional likelihood constructed via the observed data based on the k-step-ahead conditional distributions to account for the gaps in the data. This approach implies that this conditional likelihood can have simple form and this depends on the form assumed for the innovations of the model. The second approach is based on an iterative scheme where missing values are imputed in order to update the estimated parameters. Particular methods of imputation of the missing data are discussed. We treat in detail the case of INAR models. Finally, the proposed methods are applied to a data set concerning syndromic surveillance during the Athens 2004 Olympic Games. The paper concludes by discussing possible extensions to other models.
Statistics / BRG
WMAX 110, West Mall, UBC
Thu 13th November 2008
3:30pm
Department of Biostatistics University of Michigan
Statistical analysis of Illness Death and Semi-competing Risks Data
Show Abstract

 

Semi-competing risks data frequently arise in clinical and observational studies. In these cases, the subject can experience both non-terminal and terminal events where the terminal event (e.g., death) censors the non-terminal event (e.g., relapse) but not vice-versa. Typically, the two events are correlated. An approach based on latent failure times has been advocated for the analysis of such data, where the joint survival function of two event times is assumed to follow a copula function over the positive quadrant with observation restricted to the upper wedge. We argue, that similar to models for competing risks, latent failure times should generally be avoided in modeling such data.  We consider an illness-death process which circumvents any need for latent times and provides for easy incorporation of covariates. Nonparametric maximum likelihood estimation is used for inference, a simple iterative procedure is developed and needed asymptotic results are obtained. Simulation studies are conducted to assess the finite sample performance of the proposed estimators and the methods are illustrated in an analysis of data on nasopharyngeal cancer from a randomized clinical trial in Singapore.

 

 

This is joint work with Jinfeng Xu, and Beechoo Tai, National University of Singapore

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 30th October 2008
11:00am
Department of Mathematical Sciences University of Bath, UK
Workshop: Bayesian Hierarchical Models (3 and 4 of 4 tutorials) - 2 hours
Show Abstract
In this series of tutorials, we explore how complex Bayesian hierarchical models can be used in practice and in particular how they can be implemented using the WinBUGS software. This learning module in conjunction with its two predecessors this term (Introduction to spatial epidemiology and the WinBUGS workshop), will provide the skills needed to produce and implement very complex models for phenomena addressed in environmental health risk. To assist in achieving this objective, a project will be suggested, and assessed for those interested in submitting their results for review, as a follow up to this series of learning modules. UBC students who attend all three of these modules and who complete a piece of project work may petition their supervisor to apply for credit under Stat548 (directed studies).
 
The work will be presented by working through an example, a spatial-temporal model for modelling air pollutants. In conducting studies to investigate the relationship between air pollution and health, it is important to have a good measure of the level of pollution on each of the study days. Often daily measurements are available from a number of monitoring sites across the study area. Each of these monitors may measure different sets of pollutants, there may be periods of missing data, and each of the recorded measurements may be subject to error. Shaddick & Wakefield (2002) proposed a Bayesian hierarchical model for the analysis of such data, in which the dependencies across time, space and pollutants are exploited.

Details of the model can be found in: Shaddick G and Wakefield, JC. Modelling multiple pollutants at multiple sites. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2002, vol 51, no 3, 351-372.  doi: 10.1111/1467-9876.00273

The different components of this model will be explained over the tutorials, including the temporal and spatial structures and the model applied to data collected at eight sites within London, measuring particulate matter (PM10), carbon monoxide (CO), nitrogen oxide (NO) and sulphur dioxide (SO2) over the period 1994-97. The estimates of the underlying levels of pollution can then be used to `map' the pollution field and for subsequent health analysis, with uncertainty in the exposures being incorporated into the precision of the resulting estimates of risk.

Session 3: Single pollutant at multiple sites.

Here we develop the model to incorporate data from multiple monitoring sites, i.e. we introduce a spatial component to the model. This will be done by assuming that the measurements from the different sites follow a multivariate normal distribution with structure in the covariance matrix which reflects the fact that measurements made from sites that are close together are likely to be more similar than those far apart. We also explore how the resulting estimates from the models can be used to predict levels of the pollutant in question at locations where there are no monitoring sites, allowing ‘maps’ of pollution to be produced, with corresponding estimates of uncertainty. Again, we will see how such spatial models can be fit within WinBUGS and use the data on PM10 for London as an example.

Session 4: Mutiple pollutants at multiple sites.

Finally, we will combine the temporal and spatial aspects of the model from the previous sessions with a multi-pollutant model which allows a number of pollutants to be modelled simultaneously. The basic premise of this is that the temporal structure is expanded to be multivariate (normal). As the size of the data being used will have now grown considerably (4 pollutants x 4 years of daily measurements x 8 locations), we will discuss the problems of implementing such models within a Bayesian framework. These will include the computational burden of running Markov chain Monte Carlo using WinBUGS on large datasets, especially where there is a spatial structure, and suggest discuss the efficiency of using a selection of different approaches.


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th October 2008
11:00am
Department of Mathematical Sciences University of Bath, UK
Workshop: Bayesian Hierarchical Models (1 and 2 of 4 tutorials) - 2 hours
Show Abstract
In this series of tutorials, we explore how complex Bayesian hierarchical models can be used in practice and in particular how they can be implemented using the WinBUGS software. This learning module in conjunction with its two predecessors this term (Introduction to spatial epidemiology and the WinBUGS workshop), will provide the skills needed to produce and implement very complex models for phenomena addressed in environmental health risk. To assist in achieving this objective, a project will be suggested, and assessed for those interested in submitting their results for review, as a follow up to this series of learning modules. UBC students who attend all three of these modules and who complete a piece of project work may petition their supervisor to apply for credit under Stat548 (directed studies).
 
The work will be presented by working through an example, a spatial-temporal model for modelling air pollutants. In conducting studies to investigate the relationship between air pollution and health, it is important to have a good measure of the level of pollution on each of the study days. Often daily measurements are available from a number of monitoring sites across the study area. Each of these monitors may measure different sets of pollutants, there may be periods of missing data, and each of the recorded measurements may be subject to error. Shaddick & Wakefield (2002) proposed a Bayesian hierarchical model for the analysis of such data, in which the dependencies across time, space and pollutants are exploited.

Details of the model can be found in: Shaddick G and Wakefield, JC. Modelling multiple pollutants at multiple sites. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2002, vol 51, no 3, 351-372.  doi: 10.1111/1467-9876.00273

The different components of this model will be explained over the tutorials, including the temporal and spatial structures and the model applied to data collected at eight sites within London, measuring particulate matter (PM10), carbon monoxide (CO), nitrogen oxide (NO) and sulphur dioxide (SO2) over the period 1994-97. The estimates of the underlying levels of pollution can then be used to `map' the pollution field and for subsequent health analysis, with uncertainty in the exposures being incorporated into the precision of the resulting estimates of risk.

Session 1: Introductory seminar
In the first of the four sessions, the model will be presented in the form of an introductory seminar. The abstract is as follows:

Modelling levels of pollution for use in time series studies examining the relationship between air pollution and health

Gavin Shaddick*, Jon Wakefield**

Department of Mathematical Sciences, University of Bath

* Departments of Statistics and Biostatistics, University of Washington

In conducting time series studies to investigate the relationship between air pollution and a health outcome, for example respiratory mortality, it is important to have a good measure of the level of pollution on any particular day. Often daily measurements are available from a number of monitoring sites across the study. Each of these monitors may measure different sets of pollutants, there may be periods of missing data, and all of the recorded measurements will be subject to error. This paper describes the problems of combining such data to produce estimates of the levels of pollution that can be used in modelling the health outcome.

A hierarchical model is used for the analysis, addressing the issues described, and specifically, allows information from multiple sites on different pollutants to be combined. This allows an estimate of the underlying pollution level for each pollutant at each site to be obtained, incorporating any possible lag structure, along with a measure of uncertainty. This is particularly useful for accounting for the variation in the pollution level, whether formally via error-in-variables modelling, or informally when interpreting the regression coefficients describing the relationship between risk and pollution.

These methods are used in assessing the relationship between respiratory mortality and pollution in London for the period 1993-96. A number of pollutants, including PM10, CO, NO and SO2, were measured at five sites in London and the available data used to calculate a daily estimates of the underlying levels of pollution.

Session 2: Single pollutant at a single site.

In this second session, we start by performing some initial data analysis to observe the possible spatial and temporal structure in the data. We then concentrate on setting out the hierarchical model. We start with the simplest case of a single pollutant being measured at a single site, which entails fitting a temporal model allowing for dependencies over time. We see how this model can be fit in WinBUGS using the data on PM10 for London.

Session 3 and 4 to take place:  Thursday, 30 October 2008 at 11:00 a.m.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Sat 18th October 2008
9:30am
Department of Mathematical Sciences University of Bath, UK
Introduction to WinBUGS, all day workshop
Show Abstract
This workshop is aimed at statisticians, data analysts and quantitative researchers who are interested in using WinBUGS to perform Bayesian analysis. WinBUGS is a powerful tool that allows the user to perform Markov chain Monte Carlo (MCMC). The day will be split into a series of lectures and practicals, the latter with hands-on data analysis. Details and assistance on how to download and install the WinBUGS software will be provided. Participants are encouraged to bring their
own laptops.

No previous experience of Bayesian methods or WinBUGS is necessary, although familiarity with the basic principles of generalised linear regression models will be assumed together with familiarity with common probability distributions, e.g.  normal, binomial, Poisson.

Laptops and Software download: Participants are asked to bring their own laptops (running Windows) for the practicals if possible. We ask that you work in pairs during the practical classes, so if you do not have a laptop we can always pair you up with someone that does.  The room also has wireless internet access.

Notes and datasets: Notes for the workshop will be given out and electronic copies of the code and data used in the practicals made available for download.

Payment: There will be a cost of $40 for the workshop to cover overheads (including refreshments and lunch).

Application for attendance: Places for the workshop are limited and will be available on a first come first served basis. Potential participants should complete the registration document and return it to _elaine@stat.ubc.ca_ <mailto:elaine@stat.ubc.ca> by noon, Thursday 18th September 2008. Successful applicants will be notified and asked to make payment by noon, Wednesday 1st October 2008 to secure their place in this workshop. Details on how payment should be made will be sent out with the email noting acceptance.

Provisional schedule:

0930-1100 Lecture 1: Introduction to Bayesian analysis, MCMC and WinBUGS

1100-1115 Coffee

1115- 1230 Practical 1: Getting started with WinBUGS

1230-1330 Lunch

1330-1430 Lecture 2: Introduction to Bayesian inference

1430-1530 Practical 2: Using WinBUGS for conjugate analysis of binary, Poisson and Normal data

1530-1545 Coffee

1545-1630 Lecture 3: Bayesian linear regression models.

1630-1715 Practical 3: Linear regression modelling using WinBUGS

1715-1730 Wrap up


BRG
Leonard S. Klinck 462, 6356 Agricultural Road, UBC
Thu 16th October 2008
4:00pm
Department of Mathematical Sciences University of Bath, UK
Introduction to spatial epidemiology: Disease mapping ll (Session 4 of 4)
Show Abstract

In this session, we develop the methods for dealing with unstable estimates of risk to incorporate a spatial component. The basic idea is that we might expect risks in areas that are 'close’ together to be be more similar than those which are not ‘close’. We want to exploit this information in order to provide more reliable relative risk estimates in each area. The presentation of spatial data, using maps in R will be explored together with an introduction to fitting spatial models in WinBUGS.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 14th October 2008
11:00am
Visiting Scholar from Nankai University, P.R.China
Adjusted Empirical Likelihood with High-Order Precision
Show Abstract

 

Empirical likelihood is a popular nonparametric or semi-parametric statistical method with many nice statistical properties. Yet when the sample size is small, or the dimension of the accompanying estimating function is high, the application of the empirical likelihood method can be hindered by low precision of the chisquare approximation or if the estimating equations have no solutions. In this paper, the adjusted empirical likelihood is found to be effective at addressing both problems. If we choose a precise level of adjustment, the adjusted empirical likelihood achieves the high-order precision of the Bartlett correction, in addition to the advantage of a guaranteed solution to the estimating equations. Simulation results indicate that the confidence regions constructed by the adjusted empirical likelihood have coverage probabilities comparable to or substantially more accurate than the original empirical likelihood enhanced by the Bartlett correction.

 

BRG
Leonard S. Klinck 462, 6356 Agricultural Road, UBC
Thu 9th October 2008
4:00pm
Department of Mathematical Sciences University of Bath, UK
Introduction to spatial epidemiology: Disease mapping I (Session 3 of 4)
Show Abstract
Disease mapping has a long history in epidemiology, and may be defined as the estimation and presentation of summary measures of health outcomes. The aims of disease mapping include simple description, hypothesis generation, allocation of health care resources, assessment of inequalities and the estimation of background variability in underlying risk in order to place epidemiological studies into context. In this session, we start by considering the instability of estimates of relative risks, especially when dealing with small numbers. We then develop methods for addressing this issue in the non-spatial context, using real life examples implemented in R.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 7th October 2008
11:30am
Graduate Student Department of Statistics UBC
Evaluating the Performance of Simulation Extrapolation and Bayesian Adjustments
Show Abstract
Measurement error is a frequent issue in many research areas. For instance, in health research it is often of interest to understand the relationship between an outcome and an exposure, which is likely mismeasured if the study is observational or a gold standard is costly or absent.  Measurement error in the explanatory variable can have serious effects, such as biased parameter estimation, and its structure is usually not known to the investigators. We compare our proposed Bayesian approach to the commonly used simulation extrapolation method.  The Bayesian model incorporates the uncertainty of the measurement error variance and the posterior distribution is generated by using Markov chain Monte Carlo algorithms.  The comparison between the Bayesian and simulation extrapolation approaches is conducted using different cases of simulated data including validation data, as well as the Framingham Heart Study data which provides replicates but no validation data.  The underlying theme of this talk is the uncertainty involved in the estimation of the measurement error variance.  We investigate how accurately this parameter has to be estimated and how confident one has to be about this estimate in order to produce better results by choosing the Bayesian measurement error correction over the naive analysis where measurement error is ignored.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 7th October 2008
11:00am
Graduate Student Department of Statistics UBC
An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
Show Abstract
Prior sensitivity analysis and cross-validation are important tools in Bayesian statistics.  However, due to the computational expense of implementing existing methods, these techniques are rarely used.  In this talk we show how it is possible to use sequential Monte Carlo to create an efficient and automated algorithm to perform these tasks.  We apply the algorithm to the creation of regularization path plots and to check the sensitivity of the tuning parameter in g-prior model selection.  We then demonstrate the algorithm applied to cross- validation and use it to select the shrinkage parameter in Bayesian penalized regression.

BRG
Leonard S. Klinck 462, 6356 Agricultural Road, UBC
Thu 2nd October 2008
4:00pm
Department of Mathematical Sciences University of Bath, UK
Introduction to spatial epidemiology: Statistical overview (Session 2 of 4)
Show Abstract
In this second session, we give a statistical overview and put many of the concepts introduced in the first session into the form of statistical regression models. Estimation of parameters is discussed, primarily using the Poisson case as an example, both in terms of likelihood, quasi-likelihood and Bayesian methods. Examples will be given, with implementation in R.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 30th September 2008
11:00am
Department of Psychology UBC Sponsored by the MITACS Project on Statistical Methods for Complex Surveys
A Two-Stage Approach to Missing Data: Theory and Application to Auxiliary Variables
Show Abstract
Covariance structure analysis is concerned with testing hypotheses about the structure of the population covariance matrix. Applications include simultaneous equation models, factor models, and full structural equation models. In the presence of missing data, a popular ad-hoc approach to conducting such an analysis is to first obtain the saturated maximum likelihood (ML) estimate of the covariance matrix (sometimes called the ``EM covariance matrix''), and then to proceed to estimate the structured parameters treating this matrix as if it were obtained from complete data. This two-stage (TS) approach is appealing because the first stage is easily done, and the second stage reduces the problem to a familiar complete data problem. An additional advantage of the TS approach is that it allows for easy incorporation of auxiliary variables in stage 1, which may be important in predicting missingness, yet allows to completely ignore them in stage 2, reducing dimensions of the problem. The main disadvantage is that the standard errors and test statistics obtained in stage 2 will not be correct. In this talk, I will describe how to obtain correct standard errors and test statistics for the parameters obtained in Stage 2 of this approach, with both MCAR and MAR normally distributed data. I compare this approach to a direct maximum likelihood approach. While the TS approach is marginally less efficient, it performs extremely well, and its test statistic outperforms the test statistic from the direct ML approach. The TS method is recommended for use with missing data.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 25th September 2008
4:00pm
Department of Mathematical Sciences University of Bath, UK
Introduction to spatial epidemiology (Session 1 of 4)
Show Abstract
In this first session, an overview of spatial epidemiological studies will be given, concentrating on real examples from the literature. The basic concepts of epidemiological research will be introduced, including definitions or disease occurrence, types of observational studies, confounding, standardisation and the use of geographical information systems (GIS).
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 16th September 2008
11:00am
Director, Business and Industrial Statistics Research Group (BISRG) Associate Chair Undergraduate Studies and Associate Professor Dept. of Statistics and Actuarial Science University of Waterloo Waterloo, ON Canada N2L 3G1
Teaching Applied Statistics Using a Virtual Manufacturing Process
Show Abstract

This non technical talk describes an innovative and successful use of technology, through a virtual process, to aid in the teaching of statistical concepts and methodology. The virtual process simulates a manufacturing process for automobile camshafts that has a number of processing steps and many inputs.
 
At the start of an upper year undergraduate course Stat 435/835: Statistical Methods for Process Improvement, each team of students is given a budget and assigned the task of reducing variation in a critical output characteristic of a different version of the virtual process. Throughout the term, the teams plan and analyze a series of process investigations (~1/week) to first learn about how their process works and, by the end of term, how to improve it. The teams interact with the virtual process through a web interface. Each team submits a weekly written report describing their recent progress and twice per term presents to the class at a “management review meeting.” The virtual process is also used as the context for all midterms and exams. Based on anecdotal evidence and survey results, students find interacting with the virtual process fun, stimulating and challenging.
 
The goals of this talk are to show how the virtual process aids in the teaching of material and concepts in Stat 435/835 and to describe its main pedagogical benefits. With thought and some adaptation something similar should be possible for other applied statistics courses.
 

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th September 2008
11:00am
Department of Mathematical Sciences University of Bath, UK
Estimating exposure response functions using ambient pollution concentrations
Show Abstract
This paper describes how a probabilistic model to estimate personal exposures to airborne pollutants can be used to assess the health effects of air pollution. A computer model is used to simulate the exposures experienced by individuals in an urban area, using data on ambient concentrations and temperature, whilst incorporating the mechanisms that might determine exposures. The output from the model comprises a set of daily exposures for a sample of individuals from the population of interest. These daily exposures are then approximated by parametric distributions, so that the predictive exposure distribution of a randomly selected individual can be generated. These distributions are then incorporated into a hierarchical Bayesian framework (with inference using Markov Chain Monte Carlo simulation) in order to examine the relationship between short-term changes in exposures and health outcomes, whilst making allowance for long-term trends, seasonality, the effect of potential confounders and the possibility of ecological bias.

This approach is applied to a case study comprising data on particulate pollution (PM10) and respiratory mortality counts for seniors in greater London during 1997. Within this substantive epidemiological study, the effects on health of ambient concentrations and (estimated) personal exposures are compared. The proposed model incorporates within day (or between individual) variability in personal exposures, which is compared to the more traditional approach of assuming a single pollution level applies to the entire population for each day. Effects were estimated using
single lags and distributed lag models, with the highest relative risk, RR=1.02 (1.01-1.04), being associated with a lag of two days ambient concentrations of PM10. Individual exposures to PM10 for this group (seniors) were lower than the measured ambient concentrations with the corresponding risk, RR=1.05 (1.01-1.09), being higher than would be suggested by the traditional approach using ambient concentrations.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 26th August 2008
3:00pm
Jieyun (Erin) Ding
MSc Student, Department of Statistics, UBC
Mixed Effects Models with Incompletely Observed Data, with application to AIDS studies
Show Abstract
Mixed effects models are popular in longitudinal data analyses. They are particularly favored for the ability of incorporating both between-individual
variations and within-individual variations. However, statistical inferences are often complicated by incomplete data problems, which are very common
in longitudinal studies. In this project, we propose a multiple imputation (MI) method for nonlinear mixed effects (NLME) models with missing data
in time-dependent covariates. The MI method takes the missing data uncertainty into account. We analyze a real AIDS dataset using the proposed
MI method, and compare the results to some commonly used simple methods. Simulation results confirm that our MI method produces more reliable
estimation than the naive methods. For the real dataset, we also propose an alternative analyzing approach by exchanging the roles of response variable
and covariates, which is another possibility in such an AIDS study. This approach accounts for measurement errors in the covariates. For this alternative
model, we propose a computationally efficient approximate method for inference. This approach is illustrated in the previous AIDS dataset.


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 26th August 2008
3:00pm
Jing (Gina) Qu
MSc Student, Department of Statistics, UBC
Composite Likelihood for a Stochastic Volatility Model for Financial Time Series
Show Abstract

Many financial time series have been found to be heteroscedastic with autocorrelated volatility.  The traditional and widely used approaches for modeling volatility are the autoregressive conditional heteroscedastic (ARCH) or generalized autoregressive conditional heteroscedastic (GARCH) models. The stochastic volatility (SV) model enters as an alternative to the above models.  Compared with ARCH and GARCH models, the main difficulty for SV model is the evaluation of likelihood function, which evolves calculating a multi-dimensional integral. The Composite Likelihood method presented here is an alternative approach for estimating the SV model. The idea of Composite Likelihood is to reduce the dimension of integrals for the likelihood being calculated. For the bivariate case, the composite log-likelihood is the sum of log-likelihood of consecutive pairs. The performance of composite likelihood method will be compared with quasi-maximum likelihood (QML) and Monte Carlo Likelihood (MCL). Some interesting findings will also be discussed.

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 12th August 2008
11:00am
Department of Statistics and Operations Research, Public University of Navarra, Pamplona, Spain
Benchmarked estimates in small areas using linear mixed models with restrictions
Show Abstract

Linear mixed models have been frequently used to provide estimates in small areas. However, when aggregating small areas within the same region, the sum of these small area estimates does not generally match up with the estimate obtained using an appropriate estimator for the larger region.

Then, benchmarking the model-dependent estimates to the ones obtained at certain level of aggregation is needed.

In this paper, we propose a small area estimator based on a linear mixed effects model with restrictions to guarantee the concordance between the aggregations of small area estimates and those reported by statistical agencies for larger domains using a synthetic estimator. The mean squared prediction error of the restricted estimator is also derived and its performance is evaluated through a simulation study.

The procedure is applied to the 2002 Business Survey of the Basque Country, Spain. This is joint work with Ana F. Militino and T. Goicoa

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 5th August 2008
2:00pm
Dong Ho Park
Department of Information and Statistics Hallym University Korea
Role of life distributions in reliability research and current issues
Show Abstract

This talk includes a review of the importance of reliability research on life distributions in various areas.

The reliability of a system can be interpreted as the "probability that, when operating under stated environmental conditions, the system will perform its intended functions adequately for a specified interval of time". The life distribution describes the behavior of a subject (such as human life, life length of a system, etc.) as a function of its age.

A brief introduction to the literature on life distributions will be presented and several criteria of classifying life distributions will be discussed. Some properties of nonparametric classes of life distributions and their applications in system maintenance, software reliability and network reliability will be presented.

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 22nd July 2008
11:00am
Universidad de Extremadura, Cáceres, Spain
Modelling Circular Data
Show Abstract

Abstract:
Circular data arise in scientific disciplines as diverse as meteorology,
the earth sciences, biology, medicine and the political
sciences. For instance, they might represent the directions
of prevailing winds in the vicinity of a wind farm, the
orientations of fault lines in geological bedrock, the directions
of migrating birds, the degree of flexibility of the legs of injured
cyclists, or the times of violent attacks in occupied Iraq.

In my talk I will provide an introduction to the world
of circular statistics; starting with some illustrative data
sets and moving on to a consideration of the standard
descriptive measures used in the analysis of circular data, their
population counterparts and results for large sample inference.
As a spinoff, the latter provide a simple means of testing
for circular reflective symmetry. The major part of the talk will be
devoted to a consideration of models for circular data with emphasis
on the limitations of classical models and the flexibility
of alternatives proposed recently in the literature. The
application of some of the models, and the inference
associated with them, will be illustrated using real data.
 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 27th June 2008
2:00pm
Supervised learning for graph-structured data
Show Abstract

Graph-structured networks are often used to represent relationships between persons in organizations or communities and the need of classifying such data is growing. We consider two scenarios in which graph-structured data can arise: a directly observed social network, and a social network implied by email transactions. 

In both cases, the data are in an unusual format (a graph with edges, or a long list of transactions). In order to apply "off the shelf" supervised learning methods, we first map the unusual data into a more familiar format, converting the raw data into a matrix of predictors (or features).  Complimentary information arises from these two scenarios, with one yielding features that capture "who talks to whom" and the other, "how one interacts with others".  Some examples will be given to demonstrate these techniques.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 20th May 2008
11:00am
Centre for Spatial Information Science University of Tokyo
A new Bayesian variable selection criterion based on a g-prior extension for p > n
Show Abstract

For the normal linear model regression setup, we extend Zellner's g-prior for the case where the number of predictors p exceeds the number of observations n. From exact analytical calculation of the marginal density under this prior, we give a new closed form Bayesian variable selection criterion.We also give some numerical results.

This is a joint work with Professor E. George at the University of Pennsylvania.

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 23rd April 2008
4:00pm
Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario CANADA N2L 3G1
Analysis of Correlated Data
Show Abstract

Correlated data, including longitudinal and clustered data, arise frequently in health and medical studies. These data may occur when subsampling the primary sampling units or repeatedly collecting measurements over time for subjects in the study. As is well-known, standard univariate analysis methods may not be suitable to handle correlated data. There has been extensive research interest in analysis methods for such data. However, a number of challenges, such as how to accommodate complex mean and association structures of data and how to facilitate different missing data mechanisms, still remain. In this talk, I will discuss some marginal and conditional methods for analysis of correlated data.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 22nd April 2008
11:00am
Department of Statistics and Actuarial Science Simon Fraser University Burnaby BC, Canada
Application of Bayesian P-splines to the Stratified Two-Stage
Show Abstract
Two-stage capture-recapture experiments are frequently used to monitor
wild animal populations. At the first stage of the experiment, a
sample of individuals is captured, marked and returned to the
population. At the second stage (distinct in time and/or geographic
location) a new sample is collected that contains both marked and
unmarked individuals. This information can then be used to estimate
the probability of capture at the second site and hence the size of
the population.
 
One common difficulty is that studies of this type are usually
conducted over long periods of time (weeks or months) and conditions
affecting the capture of animals can change greatly during these
periods. Ignoring this heterogeneity produces biased estimates of the
population size and underestimates of the uncertainty. However, simple
models that stratify the data to account for changes day-to-day
require many parameters, and very often the data is too sparse to
estimate all parameters in these models or to provide estimates of the
population size with adequate precision.
 
The objective of this work is to develop models that smooth the
estimates of population size on each day through application of
Bayesian P-splines. The resulting models are flexible enough to allow
for reasonable changes, and provide increased precision through
sharing information about the population size across neighbouring
days. Extensions are presented that allow for deviations from the
spline when required.
 
The problem is motivated by a study of coho salmon on the Cheakamus
River in Squamish, BC, conducted by BC Hydro. Results will be
presented for this data.
 
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 17th April 2008
4:00pm
Associate Professor Departments of Public Health Sciences and Statistics University of Toronto
The multiplicity problem in Genome-Wide Association (GWA) studies
Show Abstract
Due to the recent advance in high-throughput genotyping technology, many current disease mapping studies are moving towards the Genome-Wide Association (GWA) study design. One of the main methodological challenges in the analyses of GWA studies is the inherent problem of high-dimensional hypothesis testing, because hundreds of thousands or more genetic markers are investigated simultaneously. I will discuss some of the False Discovery Rate (FDR)-based approaches, focusing on three characteristics of GWA studies: data are massive and correlated, signals are sparse, and strength of the signals is weak.

The topics to be discussed include:
  1. What are the boundaries for signal strength and sparsity for true discovery?
  2. How to improve power by utilizing prior information in a stratified fashion (Sun et al., 2006; Celia et al., 2007)?
  3. What is the comparative performance between the stratified method and the weighted p-value approach (Roeder et al., 2006)? Can we unify the two?
  4. Does it make a difference in an application to an on-going GWA study of data from the Illumina 1M chip in DCCT/EDIC for association with diabetic retinopathy using previous genome-wide linkage results as prior information?
This talk includes past and on-going work with  colleagues (alphabetically), Shelley Bull, Radu Craiu,  Celia Greenwood, Andrew Paterson, Yun Joo Yoo from Toronto, Jiashun Jin from Purdue and Dan Nicolae from Chicago.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 15th April 2008
11:00am
Associate Professor Department of Statistics University of Toronto 100 St. George Street Toronto, ON M5S 3G3 Canada
Learn from Thy Neighbour: Parallel-Chain Adaptive MCMC
Show Abstract
A considerable amount of effort has been recently invested in developing a comprehensive theory for adaptive MCMC. In comparison, there are fewer adaptive algorithms designed for practical situations. I will review some of the theoretical approaches used for proving convergence of non-Markovian adaptation schemes and will discuss scenarios for which the original adaptive Random-Walk Metropolis is unsuitable. Alternative adaptive schemes involving inter-chain and regional adaptation are discussed. Some of the proposed solutions involve theoretical questions that are still open.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 10th April 2008
11:00am
Department of Statistics, UBC
Finding gene networks in functional genomics data using graph processes
Show Abstract
Genes, the fundamental building blocks of life, act together (often through their derived proteins) in modules such as protein complexes and molecular pathways to achieve a cellular function such as DNA repair and cellular transport. A current emphasis in genomics research is to identify gene modules from gene profiles, which are measurements (such as a mutant phenotype or an expression level), associated with the individual genes under conditions of interest; genes in modules often have similar gene profiles. Clustering groups of genes with similar profiles can hence deliver candidate gene modules.

Pairwise similarity measures derived from these profiles are used as input to the popular hierarchical agglomerative clustering algorithms; however, these algorithms offer little guidance on how to choose candidate modules and how to improve a clustering as new data becomes available. As an alternative, there are methods based on thresholding the similarity values to obtain a graph; such a graph can be analyzed through (probabilistic) methods developed in the social sciences. However, thresholding the data discards valuable information and choosing the threshold is difficult.

Extending binary relational analysis, we exploit ranked relational data as the basis for two distinct approaches for identifying modules from genomic data, both based on the theory of random graph processes. We propose probabilistic models for ranked relational data that allow candidate modules to be accompanied by objective confidence scores and that permit integration of external information on gene-gene relationships.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 8th April 2008
11:00am
Simon Peacock, Dean, Faculty of Science and André Zandstra, Director of Development
Faculty of Science University of British Columbia
Fundraising and Development Campaign: Creating Opportunities
Show Abstract
Simon and Andre will review a Faculty of Science campaign to raise donations for faculty chairs, student scholarships, basic research, new buildings, etc. This is part of a larger UBC campaign with a target of one billion dollars or more, yes Billion!  After the review, there will be plenty of opportunity for discussion. Please attend with your ideas about: (1) areas of excellence in the department; (2) opportunities to move forward (3) raising and spending money.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 1st April 2008
11:00pm
Jun Zhu Department of Statistics University of Wisconsin - Madison
Statistical Modeling in Forest Entomology: Space, Time, and Multiple Species
Show Abstract
Ecological data in forest entomology often involve multiple species across space and over time. Of particular interest is the impact of two bark beetle groups on tree mortality and the subsequent gap formation over time in a plantation of Wisconsin. Traditional Markov random field models are extended to account for both spatial and temporal autocorrelation, as well as multiple response variables. A Bayesian hierarchical modeling approach is adopted for statistical inference and Markov chain Monte Carlo algorithms are devised for obtaining the posterior distributions of model parameters. Model checking and comparison are performed based on posterior predictive distributions.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 25th March 2008
11:00am
2748 Howe St., Ottawa, Ontario, K2B 6W9
Cancelled
Show Abstract
Cancelled
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 18th March 2008
11:00am
Department of Statistics Cornell University
Data-Driven Diagnostics and Model Building in Nonlinear Dynamics
Show Abstract
This talk examines the problem of data driven model building for systems thought to be described by nonlinear differential equations. I argue that lack of fit may be best represented as an unknown, smooth, additive input into these equations. Treating such inputs as a residual, standard diagnostic tools may be applied. The problem becomes less straightforward when only some components of a system are observed and I discuss approaches to dealing with this. The standard model building paradigm, however, does not extend to more complex modeling choices such as the use of higher-order systems or extra components. I show that techniques from the field of Chaotic Data Analysis may be used to indicate where such choices are appropriate. These results also provide some cautionary lessons about the limitations of data-driven inference in such systems.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 11th March 2008
11:00am
Department of Statistics Texas A&M University College Station, TX 77843-3143
Optimal estimators for semiparametric regression models with missing
Show Abstract
We consider regression models with responses that are allowed to be missing at random. The models are semiparametric in the following sense: we assume a parametric (linear or nonlinear) model for the regression function but no parametric form for the distributions of the variables; we only assume that the errors have mean zero and are independent of the covariates. For estimating general expectations of functions of covariate and response (involving the mean response as a simple special cases) we use an easy-to-implement weighted imputation estimator (adapting empirical likelihood ideas). The estimator uses all model constraints. Provided an efficient estimator for the model parameter is used, it is therefore efficient in the sense of Hajek and Le Cam. We illustrate our results with simulation examples and discuss related questions.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 19th February 2008
11:00am
UBC Department of Statistics
Combining Measurements With Deterministic Model Outputs
Show Abstract
The main topic of this talk is how to combine model outputs from deterministic models with measurements for the prediction air pollutants or other meteorological variables. We consider two different approaches to address this particular problem. The first approach is the Bayesian melding model proposed by Raftery and Fuentes (2005). Due to the Bayesian framework of the melding model, we can extend it to incorporate other components such as ensemble models, reversible jump MCMC for variable selection. However, the BM model is purely a spatial model which cannot handle the space-time data. Alternative to the BM model, we propose univariate and multivariate spatial-temporal models. We assume the spatial and temporal correlation are separable and use an AR process to model the temporal correlation. We use both the Bayesian melding and spatail-temporal models to analyze the ozone air pollution data.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 12th February 2008
11:00am
Statistician CSIRO Mathematical & Information Sciences
Monitoring the health of Queensland’s rivers: steps to designing an optimal spatial sampling scheme
Show Abstract
Spatial sampling design is a key step in developing an optimal large-scale, multi-objective aquatic monitoring program. Aquatic systems can be complex and irregular, thus it is critical to ensure that a spatial design is statistically valid, implementable, and flexible for meeting objectives such as assessing ecosystem health. Design of monitoring programs for assessing aquatic ecosystem health has been an active area of research in the United States in recent years, especially in relation to the US Environmental Monitoring and Assessment Program (EMAP). In particular, there have been innovative developments on the statistical front of spatial sampling design in order to achieve more efficient, flexible and practical monitoring and assessment. One such development is the generalised random-tessellation stratified (GRTS) design (Stevens and Olsen, 2004 J. Amer. Stat. Assoc. 99, 262-278) which generates a spatially-balanced sample of natural resource populations such as a stream network. This talk motivates adoption of this approach, provides some detail of how it works, and illustrates its implementation to the redesign of Queensland’s ambient aquatic ecosystem health monitoring and assessment program.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 5th February 2008
11:00am
Department of Statistics and Actuarial Science Simon Fraser University, Surrey
MCMC Methods for Dynamic Systems Models
Show Abstract
Traditional statistical methods for estimating parameters from differential equation models are haunted by prohibitive posterior topologies. Methods like nonlinear least squares and MCMC based on numerical solutions to a system of differential equation often get stuck in local maxima and/or are painfully slow to converge. In this talk I highlight the reasons for these common problems and present an alternative collocation tempering approach for differential equation models. The method uses smoothing to ease navigation and improve the fit to the data. Applications to neurophysiology and industrial chemical engineering highlight the methodology and the unique statistical problems of working with differential equation models.
Statistics / BRG
GEOG 212, 1984 West Mall, UBC
Mon 4th February 2008
4:00pm
Department of Epidemiology and Department of Statistics, University of California, Los Angeles 90095-1772, U.S.A.
The Need for Statistical Theory and Methods for Inference on Nonidentified Parameters
Show Abstract

Nearly all statistical theory and methods begin with the assumption that the parameter of interest is identified by the data-generating process. Indeed, this identification assumption functions as a fundamental axiom of frequentism, and is in turn adopted uncritically in “objective” Bayesian theory and methods. But the assumption is justified only to the extent it is enforced by a perfectly randomized study design (purely random in allocation, sampling, and measurement error).

The identification assumption has no justification at all for observational studies in the health and social sciences. Tragically, indiscriminate use of methods that rely on the assumption has contributed to costly medical mistakes. An appropriate if unlikely response by the statistics community would be to cease reliance on the identification assumption, and adopt instead theory and methods that allow the target parameters to remain nonidentified by the data generating process alone.

The present talk will describe some theories and methods for inference on nonidentified parameters as they have developed in the health sciences over the past decade or so. Once one sees that identification does not follow from the study design and execution, only informative prior distributions remain to narrow the possibilities. Thus, to the extent these theories provide identification, they have a large informative-Bayesian component.

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 31st January 2008
4:00pm
Drs Tom Koch and Ken Denike
The Statistics in the Map: Rewriting John Snow's South London study
Show Abstract
In biostatistics and geostatistics, in epidemiology and public health, few studies are more fundamental than John Snow's study of the 1854 cholera epidemic in South London, England. In both defining the problem and its modeling Snow's work was notable for its use of mapping. In recent years, this fundamental study has received attention from various authors, including the speakers who have recently described fundamental errors in Snow's methodology. In this talk the study is reviewed as both an early example of cartographic and geostatistical disease modeling whose limits can be corrected using modern Baysean techniques. New work to be presented considers varying approaches that combine statistical and theoretical approaches to what, in Snow, could be defined as a cartographic problem. The lecture will be of interest to those in epidemiology and public health as well as to biostatisticans and geographers.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 29th January 2008
4:00pm
Professor of Statistics The Wharton School Department of Statistics 3730 Walnut Street Philadelphia, PA 19104-6340
Large-Scale Multiple Testing: Finding Needles in a Haystack
Show Abstract
Due to advances in technology, it has become increasingly common in scientific investigations to collect vast amount of data with complex structures. Examples include microarray studies, fMRI analysis, and astronomical surveys. The analysis of these data sets poses many statistical challenges not present in smaller scale studies. In these studies, it is often required to test thousands and even millions of hypotheses simultaneously. Conventional multiple testing procedures are based on thresholding the ordered p-values. In this talk, we consider large-scale multiple testing from a compound decision theoretical point of view by treating it as a constrained optimization problem. The solution to this optimization problem yields an oracle procedure. A data-driven procedure is then constructed to mimic the performance of the oracle and is shown to be asymptotically optimal. In particular, the results show that, although p-value is appropriate for testing a single hypothesis, it fails to serve as the fundamental building block in large-scale multiple testing. Time permitting, I will also discuss simultaneous testing of grouped hypotheses.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 24th January 2008
4:00pm
Postdoctoral Fellow, Dept of Statistics, UBC
Finite normal mixture copulas for multivariate discrete data modeling
Show Abstract
Multivariate discrete data occur in several disciplines such as epidemiology, marketing, criminology, industrial statistics, to name just a few. However, flexible models for such data are not widely available and usually hard to fit. Thus, it seems that there is a lack of models appropriate for multivariate discrete data with negative dependence and/or flexible marginal choices. In this paper, we use copulas to overcome this problem. Modeling discrete data via copulas is still in its infancy. A new family of copulas is introduced that provides flexible dependence structure while being tractable and simple to use for multivariate discrete data modeling. The construction exploits finite mixtures of uncorrelated normal distributions. Accordingly, the cumulative distribution function is simply the product of univariate normal distributions. At the same time, however, the mixing operation introduces association. The properties of the new family of copulas are examined. A concrete epidemiological application with multivariate count data is given.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 22nd January 2008
11:00am
Dept of Mathematics & Statistics, Washington State University
Tail Dependence of Multivariate Distributions
Show Abstract
The tail dependence of a multivariate distribution describes the limiting proportion of exceedence of some margins over a large threshold given that the other margins have exceeded that threshold, and can be used in the analysis of dependence among extremal events. The bivariate tail dependence is frequently studied via copulas. In this talk, we discuss an alternative method to derive tractable formulas of multivariate tail dependence for the distributions whose copulas are not explicitly available. Our method depends only on the tail analysis and does not involve the marginal transforms on the entire distributions. Combining with closure properties of total positivity, our method also enables us to establish the monotonicity of tail dependence with respect to heavy tail index. The bivariate elliptical distribution and bivariate Pareto distribution are discussed throughout to illustrate the results.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 17th January 2008
4:00pm
Professor Maengseok Noh
Division of Mathematical Sciences, Pukyong National University, Busan 608-737, Korea E-mail: msnoh@pknu.ac.kr
The use of REML estimation for fitting GLMMs, NLMMs and HGLMs
Show Abstract
The restricted maximum likelihood procedure is useful for inferences about variance components in mixed linear models. However, its extension to generalized linear mixed models (GLMMs), nonlinear mixed effects models (NLMMs) and hierarchical generalized linear models (HGLMs) has encountered some difficulties. Numerical integration such as Gauss-Hermite quadrature is generally not recommended when the dimensionality of the integral is high. Approximate methods such as penalized quasi-likelihood estimators may have severe biases when analyzing binary data. In this talk I introduce the hierarchical likelihood method which resolves these difficulties. Numerical studies show how the proposed method overcomes them. We also discuss how the restricted maximum likelihood estimating equations for mixed linear models can be modified in more general models.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 15th January 2008
11:00am
Timothy Ng
Dr. Timothy Ng Chinese University of Hong Kong and Seoul National University
Statistical Inference for GARCH type Models
Show Abstract
Since Engle's work, ARCH models have received considerable attention among economists and various types of generalizations to the ARCH models have been proposed. Among these models, those incorporating the notion of fractional-differencing and non-stationarity are the most interesting ones as they offered many challenging theoretical problems. One commonly used technique to estimate the parameters in the ARCH type models is quasi-maximum likelihood estimation (QMLE). To establish the asymptotic properties of the QMLE, one usually has to impose stringent assumptions, see Robinson and Zaffaroni (2006) and Straumann (2005). They have to assume that a stationary solution to the true model exists and this solution has some finite moments. These two assumptions are too restrictive to be applied to non-stationary GARCH models exhibiting explosive behavior. Also, there are still controversies over the stationarity of the certain fractional-differencing models. In this talk, I will give a brief review on the well-established results of stationary GARCH model and present new results of two generalized ARCH-type models, namely the non-stationary GARCH model (see Jensen and Rahbek, 2004) and the fractionally-integrated GARCH model (see Baillie, et al, 1996). The regularity conditions under which the strong consistency and asymptotic normality of the QMLE of the fractionally-integrated GARCH model hold are given in this presentation. In addition, the results of non-stationary GARCH (1,1) models in Jensen and Rahbek (2004) will be extended to the general non-stationary GARCH (p,q) models.

a place of mind, The University of British Columbia

Department of Statistics

Department of Statistics, University of British Columbia
3182 Earth Sciences Building
2207 Main Mall
Vancouver, BC, Canada V6T 1Z4
Tel: 604.822.0570
Fax: 604.822.6960

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia