Department Seminars 2001

DATE/PLACE: Tuesday, November 27, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Analyzing Dynamic Regimes using Structural Nested Models
SPEAKER: Mark van der Laan
Biostatistics, UC Berkeley

In many longitudinal studies one is concerned with analyzing the causal effects of dynamic treatment regimes on an outcome of interest. A dynamic treatment regime is a rule which at each point in time determines the assigned treatment by evaluating a particular function of scores extracted from the observed history of the subject. Robins introduces and analyzes structural nested failure time models which model a so called counterfactual blip function which represents the causal effect of a final treatment action at time $t$, adjusted for the whole past of the subject, which is assumed to include all confounders of treatment. If the dynamic treatment regimes of interest and to be compared are only functions of some scores included in the observed past, then the blip function only adjusting for these scores is the parameter of interest. Therefore we introduce marginal structural nested models. In particular, we use the generalized blip function as a building block for modelling the causal effect of a dynamic treatment regime on the response distribution. We discuss estimators of this blip function and the distribution of response to a dynamic treatment regime in experimental and observational studies.


DATE/PLACE: Tuesday, November 20, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Finding recombination breakpoints in HIV sequences
SPEAKER: Jinko Graham
Statistics, Simon Fraser University
Phylogenetic profiling is a quick method for graphically displaying possible recombination breakpoints in a set of aligned molecular sequences. I'll talk about our efforts so far to evaluate the evidence for breakpoints in these profiles and will present some results from an analysis of HIV sequences sampled from an individual with a diverse virus population. The talk is aimed at a general audience and neither advanced knowledge of statistics nor molecular biology is assumed.


DATE/PLACE: Wednesday, November 13, 2001, 16:00
Room 308, Henry Angus Building
2053 Main Mall, UBC
TITLE: Elicit data, not prior: On using expert opinion in ecological studies
SPEAKER: Professor Subhash R. Lele
Dept. of Mathematical & Statistical Sciences
University of Alberta

Many ecological studies suffer because of insufficient data on the phenomenon under study. Limited data usually lead to flat likelihood and uncertain inferences. Although hard data or actual observations may be difficult to obtain, there is usually wealth of information in terms of expert opinions. A common approach to incorporating expert opinion in statistical analysis is via the Bayesian paradigm. Bayesian approach, however, faces several problems: 1) Opinions of unreliable experts adversely affect conclusions, 2) Eliciting prior beliefs in terms of probability distributions is unnatural because field experts, generally, are unfamiliar with the statistical models. Statistical models are constructs of the statisticians and experts do not necessarily think in terms of statistical models, and, 3) Eliciting priors for many parameters simultaneously and consistently is nearly impossible. In this paper, I propose an alternative approach to eliciting expert opinion in terms of data or guess values. Such expert guess values are easier to obtain than eliciting prior distributions on the parameters of a statistical model. These expert guesses or elicited data are then combined with the observed data using a hierarchical model. An important feature of this approach is that, unlike the Bayesian approach, even an expert opinion that is negatively associated with the truth, in other words, even an intentionally misleading expert opinion, improves the statistical analysis. I will illustrate an application of this methodology for the prediction of presence of certain shrew species in western Montana.


DATE/PLACE: Tuesday, November 13, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: New statistical challenges in multimedia databases
SPEAKER: Nando de Freitas
UBC Computer Science Department
A new frontier for statistics has arisen with the expansion of data --- in the form of images, video, DNA micro-arrays, text, sounds and other media --- in digital databases and on the world-wide-web. The models used to describe these databases tend to be massive. We are in the realm of models with thousands or millions of parameters. With this expansion of data come many new, varied and exciting applications. These include the design of search engines for information retrieval with images, sounds and text; constructing browsing tools for digital databases; and combining different sources of information in useful multimedia applications, such as machine translation, automatic annotation of text with images and automatic illustration of images with words. The latter application having clear ties with object recognition. There are four areas of research within this frontier that could benefit from the input from statisticians: designing more expressive and parsimonious statistical models, defining utilities and performance measures, constructing algorithms that perform well in high-dimensions, and exploring new applications.


DATE/PLACE: Tuesday, November 6, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: What contribution can statistics and statisticians make to ethical review?
SPEAKER: Margaret Shotter
until recently Department of Mathematics and Statistics
University of Edinburgh, Scotland
Experimentation on human subjects is essential for progress in many areas od research in the medical and behavioural sciences. The subjects of such studies must be given the highest standards of care and respect for their person and their dignity. This talk will be a general review of some statistical issues which have relevance for Research Ethics Boards.
DATE/PLACE: Thursday, November 1, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Biostats Research Group
TITLE: Some Statistical "Challenges" in the Health Sciences
SPEAKER: Rob Balshaw
SFU and Syreon and Daryl Lin, Tulane University and Syreon
Syreon Corporation is a local, rapidly growing, contract research organisation conducting cutting edge investigations in the health sciences. Our research programs range from immunology to psychiatric medicine, pharmacokinetics to pharmacoeconomics and even a bit of genomics. The Biometrics Group at Syreon is responsible for providing statistical support to the rest of our research team. In this talk, we will give a brief summary of several of our ongoing projects, highlighting some interesting statistical "challenges" we've encountered. Along the way, we'll see applications of mixed-effects models for longitudinal data, survival analysis with time-dependent covariates, adjustment for differences in baseline covariates using propensity scores, and others. Time permitting, we'll do a bit of brainstorming: how would *you* have handled these "challenges"?

DATE/PLACE: Tuesday, October 30, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Bayesian Analysis of the Dynamic Hierarchical Model with Application to a Changepoint Problem
SPEAKER: Jerome Asselin
Department of Statistics, UBC
Dynamic hierarchical models are a generalization of dynamic linear models. These models allow for a multi-stage hierarchy in the state equation of dynamic linear models. Each equation of this hierarchy is subject to a noise component, hence enabling a wide class of correlation structures in multivariate time series. Assuming normal data and known variance of the noise components, Gamerman and Migon presented an inference methodology for these models. In this presentation, their work is extended to allow for unknown variances. Also, we see how this class of models is readily applicable for changepoint problems. A more broad class of switching models could be written as special cases of dynamic hierarchical models, as their inner hierarchy provides a tool to easily switch the driving parameters of an observation series.

DATE/PLACE: Thursday, October 18, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Biostats Research Group
TITLE: A Data Analysis of Patients with Neurofibromatosis 2
SPEAKER: Dana Aeschliman
Statistics, UBC
To evaluate clinical and molecular predictors of mortality in people with neurofibromatosis 2 (NF2), we analyzed the mortality experience of 350 patients in the United Kingdom NF2 registry using several different methods. In the Cox model which includes as covariates: the age of diagnosis, the indicator of treatment at a specialty center, and the indicator of the presence of at least one intracranial meningioma, only the age of diagnosis (AGEDIAG) is a statistically significant covariate. In three separate exchangeable correlation structure models, each having only one explanatory variable, all three covariates, age of diagnosis, indicator of specialty center and indicator of the presence of at least one intracranial meningioma, are found to be important. Schemper's measure v shows that the proportional hazards model explains about 54% of the variation in death times. We find that there is a statistically significant, though moderate, intra-familial correlation of survival times.


DATE/PLACE: Tuesday, October 9, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Improved Access to Statistics Canada's Data through the Research Data Centres
SPEAKER: Lee Grenon
B.C. Interuniversity Research Data Centre
University of British Columbia

Researchers at SFU, UBC and UVic now have greater access to Statistics Canada's household survey data for conducting academic and policy research. Lee Grenon from Statistics Canada will introduce the newly opened British Columbia Interuniversity Research Data Centre (BCIRDC). This new centre is part of a national network of nine research data centres opening at universities across Canada. The new BCIRDC provides faculty and graduate students with access to Statistics Canada's master data files that offer much more in depth information than do the public-use datasets. Most importantly, the centre provides access to data from Statistics Canada's longitudinal surveys which are a very powerful resource for quantitative analyses of child development, economic, educational, family, health, income, labour and social issues.

The presentation will explain: What is the Research Data Centres Program? What data will be available to researchers What type of research is being conducted at the RDCs? Who is eligible to apply for access to the RDC Program How to apply for access to a Research Data Centre

The opening of the BCIRDC and other Research Data Centres across Canada is an unique opportunity for faculty and graduate students to collaborate on a wide range of multi disciplinary research projects. To find out more please join us on
Tuesday, October 9th at 4:00pm
Room 301 Leonard Klinck Building

Also for further information on the BC Interuniversity Research Data Centre please visit our website at: http://data.library.ubc.ca/rdc/

DATE/PLACE: Thursday, October 4, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Biostats Research Group
TITLE: Comparison of Methods for Multivariate Binary Responses
SPEAKER: A.H.M. Mahbub-ul Latif
Statistics, UBC
Among the existing methods for analysing the multivariate familial binary response, we discuss latent variable models and the estimating equations based methods. A brief description of the multivariate Plackett distribution is given and the role of this distribution in developing the estimating equations based methods is pointed out. We consider a simulation study to compare the maximum likelihood and estimating equations based methods for estimating the parameters of the multivariate logistic model. The multivariate logistic and probit models are compared for estimating conditional probabilities of interest in a genetics context and the respective standard errors. To handle multivariate binary data with arbitrary family structures, we have a new implementation of the GEE2 method for familial data; this routine used automatic differentiation for computing the Hessian matrix.

DATE/PLACE: Tuesday, September 25, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: STATISTICAL SHAPE ANALYSIS USING A GENERAL CLASS OF COMPLEX ELLIPTICAL FAMILY OF SHAPE DISTRIBUTIONS.
SPEAKER: Prof. Dipak Dey
Head of Department of Statistics
University of Connecticut
We develop a general class of complex elliptical shape distributions on the complex sphere. Such class contains many shape distributions, including complex Watson, Bingham, angular central Gaussian and a host of others. We study properties of this class of distributions and apply the distribution theory for modeling shapes in two dimension. Maximum likelihood and Bayesian methods of estimation are developed as well as credible regions for shapes are obtained using Markov chain Monte Carlo method. We also derive methods for assessing differences between two uncorrelated populations of uncorrelated shapes. Our results are illustrated through an example on estimation of the shape as well as comparison of male and female gorilla skulls.

DATE/PLACE: Thursday, September 20, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Biostats Research Group
TITLE: A two-component marginal model for correlated zero-inflated count data
SPEAKER: Melissa Dobbie, PhD
Statistics, UBC
The two-component approach to modelling zero-inflated count data is to first model presence or absence (non-zero versus zero counts) by a logistic model and then, conditional on presence, model the non-zero counts using a truncated discrete distribution. We focus on the two-component Poisson approach, which models the non-zero counts using the truncated Poisson distribution, and extend this model to take account of possible serial dependence between repeated counts, using a marginal modelling approach. Details of the extension and methods for checking and selecting models will be presented. The proposed methodology is applied to modelling some counts of Northern Bandicoots (Isoodon macrourus), which were collected as part of a fire ecology experiment in northern Australia.

DATE/PLACE: Tuesday, September 18, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Psychometrics - What And How Are They Measuring ?
SPEAKER: Andre Rupp
PhD Student, UBC

You are invited to an academic buffet (lots of brain food provided) where various items on the menu of psychometricians will be presented. This talk will allow you to sample tasty items such as classical test theory, modern item-response theory, differential item functioning, dimensionality of tests, and cognitive models. To make these items digestible, basic vocabulary in the field will be discussed and graphs and some formulas will be presented for your pleasure. The talk is designed for people new to the field who would like to have a solid overview of the field and a basic idea of what applied and theoretical psychometricians actually do. The presentation should leave you mentally saturated without any negative side effects - and you won't even have to stand in line.


DATE/PLACE: Tuesday, September 11, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Statisticians and global trade.
SPEAKER: Dr. Jean Cook
Manager Markets, Composite Products & Building Systems
Forintek Canada Corp

It is the speaker's experience that statisticians can have influential roles in decision making for the resolution of international trade disputes and the formulation of policies and regulations governing global trade. However, there are very few occasions when the use of statistically sound arguments alone can influence decision makers in business or government.

In this talk the speaker's thoughts on possible careers and the role and influence of statisticians in businesses and governments engaged in global trade are presented and illustrated with real examples from Canada's wood building products sector.


DEPARTMENT'S ORIENTATION MEETING Tuesday, September 4, 2001, 15:30
Leonard S. Klinck 301
6356 Agricultural Road, UBC
DATE/PLACE: Tuesday, July 26, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Joint Workshop / Biostats Research Group
TITLE: Analyzing Choice of Transportation Mode using Spatial Binary Regresssion
SPEAKER: Prof. Claudia Czado
University of Technology, Munich, Germany
(currently visiting Statistics, U of Washington)

In a world of ever increasing traffic it becomes vital to know why and when people choose public transport options. To understand these determinants a mobility study on the micro level has been undertaken in Munich, Germany. About 160 households with 260 individuals taking about 1800 trips were recorded. In addition to detailed trip related information such as mode of transportation, time, length, duration, weather conditions and purpose, household and person specific information has been collected as well. Household and person specific information included standard demographic variables and mobility specific variables such as car ownership and license status as well as availability of public transport options and postal code of the residence. Initial analyses ignored the spatial information provided by the postal code. In this talk I will introduce a binary spatial regression model, which allows for the modeling of spatial effects. Estimation is based on Markov Chain Monte Carlo (MCMC) methods. This model is able to construct maps, which might point to specific neighborhoods of low or high utilization of public transport after adjustment to trip, person and household specific variables. First results on a pilot data set will be reported.


DATE/PLACE: Tuesday, July 17, 2001, 16:
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Joint Workshop / Biostats Research Group
TITLE: Semi-parametric ROC analysis to evaluate biomarkers for disease
SPEAKER: Tianxi Cai
Department of Biostatistics
University of Washington, Seattle

Receiver operating characteristic (ROC) curves are popular method for characterizing the accuracy of diagnostic tests when the test result is not binary. Various methodologies for estimating and comparing ROC curves have been developed. One approach, due to Pepe (1997, 2000a), uses a parametric regression model $ROC_x(t) = g(h_0(t) + ~ x)$ with the baseline function $h_0(t)$ specified up to finite dimensional parameters. In this paper, we extend the regression models by allowing arbitrary non-parametric baseline functions. We also provide asymptotic distribution theory and procedures for making statistical inference. We illustrate our approach with datasets from two studies of cancer biomarkers. Simulation studies suggested that the extra flexibility inherent in the semi-parametric method is gained with little loss in statistical efficiency.

This is the technical part of an earlier talk given at the Peter Wall Institute at UBC.


DATE/PLACE: wednesday, July 11, 2001, 20:00
Conference Room 307, Peter Wall Institute
University Centre, 6331 Crescent Road, UBC
TITLE: Semi-parametric ROC analysis to evaluate biomarkers for disease
SPEAKER: Tianxi Cai
Department of Biostatistics
University of Washington, Seattle
Receiver operating characteristic (ROC) curves are popular method for characterizing the accuracy of diagnostic tests when the test result is not binary. Various methodologies for estimating and comparing ROC curves have been developed. One approach, due to Pepe (1997, 2000a), uses a parametric regression model $ROC_x(t) = g(h_0(t) + ~ x)$ with the baseline function $h_0(t)$ specified up to finite dimensional parameters. In this paper, we extend the regression models by allowing arbitrary non-parametric baseline functions. We also provide asymptotic distribution theory and procedures for making statistical inference. We illustrate our approach with datasets from two studies of cancer biomarkers. Simulation studies suggested that the extra flexibility inherent in the semi-parametric method is gained with little loss in statistical efficiency.

DATE/PLACE: Tuesday, June, 19, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Markov Chain Monte Carlo Algorithm Comparisons
SPEAKER: Sijin Wen
Department of Statistics
University of British Columbia
Various Markov chain Monte Carlo algorithms are available for sampling from a posterior distribution. The random walk Metropolis algorithm is a simple scheme which is frequently used in Bayesian statistical problem. The guided walk algorithm attempts to suppress the random walk behavior in the random walk Metropolis algorithm. Other algorithms, such as the Langevin algorithm and the hybrid algorithm use more information about the posterior distribution than the random walk Metropolis algorithm and the guided walk algorithm. In this thesis, The performance of each of those four algorithms has been examined, based on simulation studies using multivariate normal target distributions. Then we compare the algorithms in terms of efficiency and convergence time. Moreover, these four algorithms are compared using a posterior distribution for parameters given observed data in an application.

Workhop on Inference from Genetic Data on Pedigrees Sunday, June 10th, 2001

Click here for more information.


DATE/PLACE: Tuesday, June, 5, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Combining Information by Using Maximum Weighted Likelihood Method
SPEAKER: Steven Wang
Department of Statistics
University of British Columbia
A maximum weighted likelihood method is proposed to combine all the relevant data from different sources to improve the quality of statistical inference especially when the sample sizes are moderate or small. The asymptotic properties of the maximum weighted likelihood estimator (WLE) will be presented. A procedure for adaptively choosing the weights by using cross-validation is proposed. The derivation of the weighted likelihood function by using the maximum entropy principle will also be presented. The saddlepoint approximations to the distributions of the linear WLE and WLE derived from the estimating equations are derived for small sample sizes. The advantages for using the WLE will be demonstrated by the results of simulation studies and applications to the disease mapping.

DATE/PLACE: Monday, June, 4, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Priors for Bayesian Neural Networks
SPEAKER: Mark Robinson
Department of Statistics
University of British Columbia
In recent years, Neural Networks (NN) have become a popular data-analytic tool in Statistics, Computer Science and many other fields. NNs can be used as universal approximators, that is, a tool for regressing a dependent variable on a possibly complicated function of the explanatory variables. The NN parameters, unfortunately, are notoriously hard to interpret. Under the Bayesian view, we propose and discuss prior distributions for some of the network parameters which encourage parsimony and reduce overfit, by eliminating redundancy, promoting orthogonality, linearity or additivity. Thus we consider more senses of parsimony than are discussed in the existing literature. We investigate the predictive performance of networks fit under these various priors.

DATE/PLACE: Thursday, May 31, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Rd, UBC
TYPE: Joint Workshop / Biostats Research Group
TITLE: Case-control studies with misclassified exposure: a Bayesian approach
SPEAKER: Refik Saskin
Department of Microbiology
Mount Sinai Hospital, Toronto

When dealing with the case-control data, it is often the case that the exposure to a risk factor of interest is subject to misclassification. Methods for correcting the odds ratios are available when the misclassification probabilities are known. In practice, however, good guesses rather than the exact values are available for these probabilities. We show that when these guesses are treated as exact even the smallest differencies between the true and guessed values can lead to very erroneous odds-ratio estimates. This problem is alleviated by a Bayesian analysis which incorporates the uncertainty about the misclassification probabilities as prior information.

In practice, data on the exposure variable are quite often available from more than one source. We review three methods for improving the odds-ratio estimates that combine information from two sources. We then develop a Bayesian approach which is based on latent class analysis, and apply it to the sudden infant death syndrome data.

The inference required the use of the Metropolis-Hastings algorithm and/or the Gibbs sampler.


DATE/PLACE: Tuesday, May 29, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Rd, UBC
TYPE: Joint Workshop / Biostats Research Group
TITLE: Regression analysis of cluster-correlated data using estimating equations
SPEAKER: Brian Leroux
Department of Biostatistics
University of Washington
This talk concerns estimating equation methods for fitting regression models to cluster-correlated data such as arise from studies using cluster sampling or longitudinal designs. Some background will be presented, with an emphasis on the impact of the correlation structure of the response on the relative efficiency of different estimating equations. A new method is presented based on the theory of optimal combinations of estimating equations. This method yields more efficient estimators than the standard generalized estimating equations of Liang and Zeger in some situations while avoiding the need to model the correlation structure. The results are supported by both asymptotic theory and simulation studies. This talk is based on joint work with Julie Stoner, University of Nebraska.


PIMS-MITACS Seminar Series on
Computational Statistics and Data Mining

DATE/PLACE: Thursday, April 26, 2001, 16:00
PIMS UBC, 1933 West Mall
West Mall Annex, Room 216
TITLE: A Simple Model for a Complex System: Predicting Travel Times on Freeways
SPEAKER: Professor John Rice
Department of Statistics
University of California, Berkeley
A group of researchers from the Departments of EECS, Statistics, and the Institute for Transportation Research at UC Berkeley has been collecting and studying data on traffic flow on freeways in California. I will describe the sources of data and give an overview of the problems being addressed. I will go into some detail on a particular problem-forecasting travel times over a network of freeways. Although the underlying system is very complex and tempting to model, a simple model is surprisingly effective at forecasting.

DATE/PLACE: Tuesday, April 3, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Assessing Informative Drop-out in Models for Repeated Binary Data
SPEAKER: Lee Er
Department of Statistics
University of British Columbia

Drop-outs are a common problem in longitudinal studies. In terms of statistical models for the data, there are three types of drop-out mechanisms: drop-out occurring completely at random (CRD), drop-out occurring at random (RD) and informative drop-out (ID). The drop-out mechanism is classified as CRD if the drop-out mechanism is independent of the measurements; as RD if the drop-out mechanism depends only on the observed but not the unobserved measurements, and as ID if the drop-out mechanism depends on both the observed and unobserved measurements. CRD and RD are referred to as ignorable because the drop-out mechanism can be ignored for the purpose of making inferences about the observed measurements, while ID is non-ignorable. Analyses based on an assumption of ignorable drop-out, when in reality the drop-out mechanism is non-ignorable, can lead to misleading or biased results. Likelihood-based models for continuous and categorical longitudinal data subject to non-ignorable drop-out have been developed. In this talk, we focus on exploring likelihood-based models for binary longitudinal data subject to informative drop-out.

The two modelling approaches considered are a selection model proposed by Baker (1995) and a transition model proposed by Liu et al. (1999). We apply these models to a data set from a multiple sclerosis (MS) clinical trial. The aims of the analyses are to investigate whether there is an indication of informative drop-out in this data, and to assess the sentivity of inferences concerning the treatment effects to the underlying drop-out mechanisms. We do not attempt to provide a definitive analyses of the data set, but rather to explore a variety of models which incorporate informative drop-out.

DATE/PLACE: Thursday, March 29, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Rd, UBC
TYPE: Research Seminar
TITLE: Log-Rank Tests When The Main Interest Is in Differences in Durability of Treatment Response, with Appliation to AIDS Clinical Trials
SPEAKER: Lang Wu
Department of Statistics
University of British Columbia
At the present time, many AIDS clinical trials of combination antiretroviral therapies compare treatments by a time to failure primary endpoint that measures durability of suppression of HIV-1 replication. For such studies, early and/or late survival differences between two treatments are of primary interest. We propose a weighted log-rank statistic which emphasizes early and/or late survival differences. We also consider some versatile tests that are sensitive to a wider range of alternatives. The performance of these new tests are evaluated by simulations. When the main interest is in comparing treatments by the durability of suppression in responders, these tests may be preferred to the commonly used log-rank tests which are currently applied routinely.

* This work in progress is joint with Peter Gilbert

DATE/PLACE: Tuesday, March 27, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Recurrence Relation for Minimum Variance Unbiased Estimation of Certain Left-Truncated Poisson Distributions
SPEAKER: Prof. Jagdish Ahuja
Dept of Mathematical Sciences
Portland State University
The minimum variance unbiased estimators, g(z,n) and G(z,n), for the parameters of Poisson distributions truncated on the left at zero and one respectively, based on a sample of size n and sample total z, are obtained using one of the results for the generalized power series distribution. These estimators which depend on Stirling numbers of the second kind and their linear combination, are difficult to calculate even for small values of n and z. Recurrence relations for g(z,n) and G(z,n) are provided independent of Stirling numbers of the second kind which are useful for tabulation purposes. Behavior of these two estimators as functions of n and z is also examined. Further, a recurrence relation for the minimum variance unbiased estimator, gc(z,n), for the parameter of Poisson distribution truncated on the left at 'c', based on a sample size n and sample total z, is also derived using a recurrence formula for the generalized Stirling numbers of the second kind.

DATE/PLACE: Tuesday, March 20, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Psychometric Theory: Statistical Methodology for the Analysis of Measurement Data
SPEAKER: Bruno D. Zumbo, Ph.D.
Measurement, Evaluation, and Research Methodology Program
University of British Columbia

Error of measurement is pervasive in data arising from social and behavioral research (including research in education and the health sciences). Methods for quantifying and studying this error of measurement have evolved over the last nearly 100 years into a coherent statistical approach. However, there has been some confusion as to the role of apparent "assumptions" such as correlated errors in the models. I will present an approach to conceptualizing psychometric models that brings to light relations among concepts in probability, statistics, and measurement. Furthermore this approach clarifies some of the confusion that is creeping into research methodology in the social and behavioral sciences about the various assumptions in the models. Where appropriate I will also present some examples with real data from the social and health sciences.


PACIFIC NORTHWEST STATISTICS CONFERENCE Friday, March 16, 2000
Simon Fraser University

See the conference web site for more information.


DATE/PLACE: Tuesday, March 13, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Nonparametric testing for a monotone hazard function: making a global test local
SPEAKER: Prof. Nancy Heckman
Department of Statistics
UBC

There are several well-known tests of the null hypothesis that the hazard function is non-increasing. However, these tests were designed to have power against the alternative that the hazard function is always increasing. They do not have much power when, say, the hazard is increasing on only a small interval. Fortunately we can modify these tests so that they can detect an increase on a small interval. Specifically, the test of Proschan and Pyke (1967), based on normalized spacings, is modified to a more local test. The significance level of the local test is attained when the data are exponentially distributed, and thus we can easily calculate p-values via simulation. The idea of localizing the Proschan and Pyke test is inspired by recent developments in nonparametric inference in bump-hunting in regression analysis


*This is joint work with
Irene Gijbels
Institute of Statistics
Catholic University of Louvain


DATE/PLACE: Thursday, March 8, 2001
Simon Fraser University
TYPE: Research Seminar
TITLE: Detection of Change in Environmental Series
SPEAKER: Sylvia Esterby
Department of Mathematics and Statistics
Okanagan University College

Detection and estimation of change are recurring themes in environmental assessment. Quality indicators are measured to answer questions regarding status and change in the quality of the environmental compartment of interest. Although superficially straightforward, there is a myriad of aspects to such questions. In the area of water quality, temporal records of quality indicators are generally not long and this partially dictates the methods that can be used. Several topics related to the assessment of water quality will be discussed, including trend detection and estimation, and methods for the identification of similar patterns. The latter topic involves methods of cluster analysis.

DATE/PLACE: Tuesday, March 6, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Introduction to Spatial Point Processes
SPEAKER: Mark Robinson
Department of Statistics
UBC
In this talk, I will discuss some of what I learned on my exchange semester in Aalborg, Danmark. The semester's topic was Spatial Point Processes (SPP). I will discuss the main topics we encountered in the course, including the Poisson point process, summary statistics of an SPP, various SPP models and the estimation of parameters in such models. If time permits, I will discuss a few (Bayesian) applications.

DATE/PLACE: Thursday, March 1, 2001, 16:00
University of British Columbia
TYPE: Research Seminar
TITLE: Statistical Challenges in Associating Air Pollution and Lung Cancer Incidence

SPEAKER: Nhu Le
Department of Statistics
University of British Columbia

In this talk, I will describe some statistical difficulties in studying the relationship between air pollution and lung cancer. Specifically, problems related to the estimation of the cumulative air pollution exposure for individuals, in a case-control setting, whose locations of residence might have changed several times, will be discussed. Some recent advances in statistical theory for dealing with such problems will be presented, along with preliminary results from a case-control study of lung cancer patients in British Columbia.


DATE/PLACE: Thursday, February 15, 2001
Conference Room 6
Providence Building
St. Paul's Hospital
TYPE: Research Seminar
TITLE: The Impact of Study Design on Estimating Efficacy and Effectiveness in Epidemiology
SPEAKER: Adrian Levy, PhD
Department of Health Care and Epidemiology, UBC
Centre for Health Evaluation and Outcome Sciences, St. Paul's

There is widespread consensus that, as a result of the randomization procedure, internal validity is likely to be stronger in randomized controlled trials (RCTs) than in non-randomized, observational designs. For this reason, many investigators believe that the best estimate of benefit of a medical treatment is obtained from an RCT and it is often stated that RCTs form a "gold standard". Other investigators believe that, as a result of wider external validity, valid estimates of treatment effect can be obtained from observational designs. Several reviews of the two designs have found that no obvious pattern emerges: neither the RCTs nor the observational studies consistently gave larger or smaller estimates of the treatment effect. When the results from the two study designs are in agreement, the decision to treat may become straightforward. In cases where the two designs give different answers, clinicians and their patients are faced with difficult choices. It behooves the research community to address the question, "What should we do when randomized controlled trials and observational studies disagree and which type of study design is more likely to give the truth?" This presentation will introduce some differences between the study designs that may be expected on theoretical grounds and review empirical evidence of differences in treatments of cardiovascular disease.


*Directions (thanks to Anona Thorne -- and for organizing this session!)

Conference Rooom 6 is in the conference centre on the "basement" level of the Providence Building. The easiest way to get to it is to go there from the information wicket, which is at the South end of the Burrard building (the old building that fronts on Burrard): - follow the blue line on the floor from there through the Providence wing to the elevators at the end of that wing (you'll see various signs on the way that indicate you're on the way to the conference centre, among other places) - take the elevator down to level 1 - when you get out of the elevator, you'll see a sign directing you to the conference centre - Conference Room 6 is near the end of the hallway on the left. The St. Paul's Hospital parking lot is often full at that time of day, but there are many other parking lots in the neighbourhood.


DATE/PLACE: Thursday, February 8, 2001, 14:00
Leonard S. Klinck 301
UBC
TYPE: Joint Workshop / Biostats Research Group
TITLE: Gene Expression Analysis from DNA Microarrays
SPEAKER: Jennifer Bryan
Biostatistics
UC Berkeley

Microarrays allow researchers to capture the intensity of expression for thousands of genes at once. Often we compare expression in two tissues (for example, healthy versus cancerous or pre-treatment versus post-treatment) and attempt to identify genes that exhibit biologically meaningful expression profiles. For example, we might be interested in genes that are differentially expressed or that exhibit strong coexpression with other genes.

In this talk, I describe the use of a deterministic rule, applied to the parameters of the gene expression distribution, to select a target subset of genes. The target subset is the parameter of interest, which can be estimated by applying the subset rule to observed sample statistics. I will discuss the conditions necessary for consistency of the subset estimator and will provide a sample size formula. Important features of the sampling distribution are estimated with the parametric bootstrap. The practical performance of the method is illustrated with a data analysis in breast cancer.


DATE/PLACE: Friday, February 2, 2001, 09:30
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Robust inference using the quadratic inference function
SPEAKER: Chanseok Park
Dept of Statistics
Pennsylvania State University
We propose a new robust inference methodology based on the quadratic inference function. With this tool one can optimally combine model-efficient and model-robust estimating functions to construct an efficient but robust parameter estimator. For example, one can create an adaptive estimator of the mean and median that is fully efficient at the normal model but is highly robust, with a 25% asymptotic breakdown point. The methodology includes robust chi-square tests of parametric hypotheses as well as a chi-square goodness-of-fit statistic for the modeling hypotheses. We examine the effects of nuisance parameters and methods for dealing with them. We compare the performance of this new method with other well-known methods by Monte Carlo simulations.

DATE/PLACE: Thursday, February 1, 2001
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Composite likelihood based inference for hierarchical models
SPEAKER: Subhash Lele
Math Sciences
University of Alberta

Hierarchical models have proved their use in many epidemiological and ecological investigations. They are particularly useful for modelling non-normal (count or binary data) in spatial and space-time series situations. Likelihood-based statistical inference for hierarchical models is challenging due to analytical and computational complexity. In this presentation, I suggest a method of inference based on the concepts of composite likelihood and estimating functions. I will review some of the earlier work in this area (Lele, 1997, Heagerty and Lele, 1998, Lele and Taper, 2001) and extend it to non-linear, non-normal time series and spatial-time series situation. Instead of trying to write the composite likelihood in an analytical fashion (which may not even be possible for realistic models), I use Monte-Carlo methods to estimate it. This "smooth simulated composite likelihood (SSCL)" function, when maximized, yields consistent and asymptotically normal estimators. Moreover, the computational burden of this method, although intensive, is not excessive. I will discuss some applications of this method to ecological and epidemiological situations.


DATE/PLACE: Wednesday, January 31, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road
TITLE: Computing expectations by conditioning
SPEAKER: Peter Liu
Department of Statistics
UBC
One of the important concepts in probability theory is the expectation of a random variable. Situations often arise where we have some information on another random variable that helps us compute the expected value of the original random variable. The talk will be presented as a 50 minutes lecture for statistics students at the 300 level. A proof of the equation E[X] = E[E[X|Y]] will be given. Computations and applications will be illustrated by the use of examples

DATE/PLACE: New Time

Tuesday, January 30, 2001, 16:00
Leonard S. Klinck 301
6356 Agricultural Road

TITLE: Compression and Analysis of Microarray Images
SPEAKER: Rebecka Jornsten
Department of Statistics
UC Berkeley

The microarray image technology makes it possible to measure the simultaneous expressions of thousands of genes. It has become the standard tool to investigate fundamental biological functions. It is widely used in laboratories of academia and industry, producing vast quantities of image data. Experiments are expensive and future re-processing of the data may be necessary due to the still evolving statistical modeling. For these reasons, the full image data are always kept, resulting in immense storage requirements. This calls for compression schemes which take into account the statistical inference that is to follow, or a new definition of "irrelevance" of image features based on the loss of statistical information rather than visual distortion.

In this talk I present a microarray image compression scheme with a multi-level (lossless or lossy) coded data structure which facilitates statistical analysis and data transmission. As components, it uses adaptive segmentation, predictive coding, and wavelet transforms.

The high noise levels of microarray image data suggests the use of lossy compression. However, lossy compression necessarily leads to the loss of statistical information, which may affect future statistical modeling and inference. I address the question of optimal statistical estimation based on lossily compressed data and present a new upper bound on the minimum achievable loss of estimation efficiency due to compression. This is an interesting information-theoretic result in the field of multiterminal data compression, and can be used to evaluate the performance of practical compression schemes applied to microarray images.

Time permitting, I will briefly discuss the use of a statistical modeling principle based on data compression or coding: Rissanen's minimum description length (MDL) principle, for gene subset selection based on microarray data for classification of types of Leukemia.


DATE/PLACE: Monday, January 29, 2001, 16:15
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Feature Extraction
SPEAKER: Mu Zhu
Department of Statistics
Stanford University

The Internet has spawned a renewed interest in the analysis of co-occurrence data. Correspondence analysis can be applied to such data to yield useful information. A less well-known technique called canonical correspondence analysis (CCA) is suitable when such data come with covariates. We show that CCA is equivalent to a classification technique known as linear discriminant analysis (LDA). Both CCA and LDA are examples of a general feature extraction problem.

LDA as a feature extraction technique, however, is restrictive: it can not pick up high-order features in the data. We propose a much more general method, of which LDA is a special case. Our method does not assume the density functions of each class to belong to any parametric family. We then compare our method in the QDA (quadratic discriminant analysis) setting with a competitor, known as the sliced average variance estimator (SAVE). Our study shows that SAVE over-emphasizes second-order differences among classes.

Our approach to feature extraction is exploratory and has applications in dimension reduction, automatic exploratory data analysis, and data visualization. We also investigate strategies to incorporate the exploratory feature extraction component into formal probabilistic models. In particular, we study the problem of reduced-rank non-parametric discriminant analysis by combining our work in feature extraction with projection pursuit density estimation.


DATE/PLACE: Friday, January 26, 2001, 09:30
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TITLE: Proportional hazards regression model with unknown link function and applications to longitudinal data
SPEAKER: Wei Wang
Dept of Statistics
UC Davis

Cox proportional hazards regression model usually assumes that covariates have log-linear effects on the hazard function. Now we consider a more general proportional hazards regression model with a nonparametric link function. We mainly focus on the situation when the baseline hazard function is also unspecified. A two-step iterative algorithm is proposed to estimate the link function and the covariate effects. We make inference based on a local version of the partial likelihood.

Large sample properties are discussed and several simulation studies are conducted to evaluate the performance of this estimation procedure.

We apply this method to data with longitudinal covariates and provide a way to handle possible missing values using functional principal components analysis.


PIMS-MITACS Seminar Series on
Computational Statistics and Data Mining

DATE/PLACE: Thursday, January 25, 2001, 16:00
PIMS UBC, 1933 West Mall
West Mall Annex, Room 216
TITLE: Robust Factor Model Fitting and Visualization of Stock Market Returns
SPEAKER: Professor Douglas Martin
Department of Statistics
University of Washington
Chief Scientist
Data Analysis Products Division of MathSoft, Inc.

Stock market returns are often non-Gaussian by virtue of containing outliers. Modeling stock returns and calculating portfolio risk is almost invariably accomplished by fitting a linear model, called a "factor" model in the finance community, using the sanctified method of ordinary least squares (OLS). However, it is well-known that stock returns are often non-Gaussian by virtue of containing outliers, and that OLS estimates are not robust toward outliers. Modern robust regression methods are now available that are not f stock returns using firm size and book-to-market as the factors, where we show that OLS gives a misleading result. Then we show how Trellis graphics displays can be used to obtain quick, penetrating visualization of stock returns factor model data, and to obtain convenient comparisons of OLS and robust factor model fits. Last but not least, we point out that robust factor model fits and Trellis graphics displays are in effect powerful "data mining tools" for better understanding of financial data. Our examples are constructed using a new S-PLUS Robust Methods library and S-PLUS Trellis graphics displays.


DATE/PLACE: Tuesday, January 23, 2001, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Bayesian Nonparametric Modelling using Mixtures of Triangular Distributions
SPEAKER: Prof. Francois Perron
University of Montreal

Nonparametric modelling is an indispensible tool in many applications and its formulation in an hierarchical Bayesian context, using the entire posterior distribution rather than particular expectations, increases its flexibility. In this paper, focus is on nonparametric estimation through a mixture of triangular distributions. The optimality of this methodology is addressed and bounds on the accuracy of this approximation are derived. Although our approach is more widely applicable, we focus for simplicity on estimation of a monotone nondecreasing regression on [0,1] with additive error, effectively approximating the function of interest by a function having a piecewise linear derivative. Computationally accessible methods of estimation are described through an amalgamation of existing MCMC algorithms. Simulations and examples illustrate the approach.


*This work was based on a common project with K. Mengersen of Queensland University of Technology


DATE/PLACE: Monday, January 22, 2001, 16:15
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Bayesian Analysis of Computer Code Uncertainties
SPEAKER: Professor Anthony O'Hagan
University of Sheffield
United Kingdom

This talk concerns inference about complex computer codes. There is growing concern amongst users of computer models about how to assess their validity and accuracy. In the environmental sciences, for example, computer codes are widely used to model atmospheric and marine dispersion, radioactive waste disposal risks, effects of ingesting toxic chemicals, wildfire events, etc. The EPA recently organised a three-day workshop to explore ways of addressing the validation of complex environmental models. Quantifying uncertainties in the use of computer codes is clearly a statistical question, but conventional statistical methods are ill-equipped to tackle some of the issues that arise. This talk will review such problems and a Bayesian methodology that is proving to be a powerful tool in addressing them.

DATE/PLACE: POSTPONED

Thursday, January 18, 2001
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC

TYPE: Research Seminar
TITLE:
SPEAKER: Nhu Le
BC Cancer Agency

DATE/PLACE: Tuesday, January 16, 2001, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: On the Limitations of the Neyman-Pearson, Likelihood Ratio, and Maximum Likelihood Criteria
SPEAKER: Prof. Michael Perlman
Department of Statistics
University of Washington, Seattle
Simple examples, both historical and recent, support the contention of Fisher and others that the NP criterion of a most powerful size alpha test is not, in general, relevant to the purposes of scientific inquiry. In these examples, the LR criterion provides appropriate inferences despite the existence of more powerful tests of the same size. In similar, equally simple examples, however, the LR and corresponding ML criteria provide inappropriate inferences. It is of interest to identify the distinguishing features of these two classes of examples, both of which involve only well-behaved statistical models, such as families of univariate normal distributions with known variances, and are not artifacts of irregularity, contamination, or unboundedness. The second class involves hypotheses consisting of the union of two or more subhypotheses of different linear dimensions, as may occur in model selection problems. This suggests the importance of determining the geometric nature of statistical models before routinely applying LRTs and MLEs. General LISREL models, including linear structural equations and latent variable models, may be problematic.

DATE/PLACE: Monday, January 15, 2001, 16:15
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Wavelet Based Estimation for Trend Contaminated Long Memory Processes
SPEAKER: Peter Craigmile
Department of Statistics
University of Washington

A common problem in the analysis of environmental time series is how to deal with a possible trend component, which is usually thought of as large scale (or low frequency) variations or patterns in the series that might be best modelled separately from the rest of the series. Trend is often confounded with low frequency stochastic fluctuations, particularly in the case of models such as fractionally differenced (FD) processes, which can account for long memory dependence (slowly decaying auto-correlation) and can be extended to encompass non-stationary processes exhibiting quite significant low frequency components. In this talk we assume a model of polynomial trend plus FD noise and apply the discrete wavelet transform (DWT) to separate a time series into pieces that can be used to estimate both the FD process parameters and the trend. The estimation of the process parameters is based on an approximative maximum likelihood approach that is made possible by the fact that the DWT decorrelates FD process approximately. Once the parameters have been estimated, we can then test for a non-zero trend. After outlining the work that we have done to date on testing for non-zero trends, we demonstrate our methodology by applying it to a popular climate dataset.


DATE/PLACE: Tuesday, January 9, 2001, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Out of the Closet and Into the Streets: Towards a New Statistical Paradigm
SPEAKER: Prof. Bertrand Clarke
Department of Statistics
UBC

Reference priors originally emerged from maximizing the asymptotic expression for the Shannon mutual information between a parameter and a data set. The idea was to interpret estimation information-theoretically. In the absence of nuisance parameters, Jeffreys prior achieves the maximum and later extensions cover cases involving nuisance parameters.

Here, we replace the data set with a statistic and permit more elaborate conditioning. Thus we examine conditional Shannon mutual informations between a parameter and a function of the data, given a nuisance parameter and another statistic. This leads to new reference priors, a calibration of sufficiency, and to priors that depend on the data, a case ruled out by the conventional Bayesian paradigm.

a place of mind, The University of British Columbia

Department of Statistics

333-6356 Agricultural Road
Vancouver, BC, V6T 1Z2
Tel: 604.822.0570
Fax: 604.822.6960
E-mail: [UNIT E-MAIL]

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia