Department Seminars 2000

DATE/PLACE: Tuesday, November 28, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: CAST: A Computer-based Resource for Teaching Statistics
SPEAKER: Doug Stirling
Institute of Information Sciences and Technology
Massey University
New Zealand

The wide availability of fast computers has had enormous impact on introductory statistics courses. Computers initially allowed students to apply numerical and graphical methods to realistic data sets, thereby reducing the emphasis on numerical algorithms and formulae. More recently, it has been recognised that computers also have great potential for teaching statistical concepts, adding to their role as sophisticated calculator.

Programs such as Minitab and SAS lack features required for teaching concepts. For example, we might want to ...

  • graphically display standard distributions and show how changes to the parameters affect the shape of the distribution,
  • repeatedly select samples from a model to show the variability of graphical and numerical summaries,
  • build empirical distributions of summary statistics, confidence intervals and p-values with simulation in order to illustrate their properties, ...

While data-analysis programs can perform some of the above, they can rarely make the mechanism clear to students. Models, sampling and empirical distributions must be first-class citizens in software used for teaching statistical concepts.

This talk will demonstrate CAST, a computer-based resource that is designed to teach statistical concepts. CAST is accessed using a web browser and contains both expository text and over 300 small programs (applets) that do most of the teaching. The applets share an extensive framework of code that is designed for teaching statistical concepts.

CAST can be described as a textbook with dynamic, interactive diagrams. Since students must interact with each page, it is claimed that their attention is retained and learning is improved. Even in lectures, the diagrams are effective ways to teach most statistical concepts.


DATE/PLACE: Tuesday, November 21, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Comparison of HIV type-specific infectivities from competing risks failure time data
SPEAKER: Prof. Peter Gilbert
Department of Biostatistics
Harvard University, Boston
To assist in the design of HIV vaccines, it is helpful to know if and how various HIV genotypes and phenotypes differ in their infectivity as defined by the per exposure transmission probability. For male-to-female sexual exposure, this question is adressed for HIV-1 versus HIV-2 through analysis of standard competing risks failure time data from a 15 year prospective cohort study of female commercial sex workers in Dakar, Senegal. Estimation of the HIV-1/HIV-2 infectivity ratio over time is based on nonparametric estimation of the HIV-1/HIV-2 infection hazard ratio over time adjusted by estimates of the HIV-1/HIV-2 prevalence ratio in the infected exposing male partner population. Hypothesis testing is based on a test process given by a weighted difference of estimates of cumulative type-specific hazard rates adjusted for estimates of the HIV-1/HIV-2 partner prevalence ratio. Under proportional hazards assumptions, the estimation and testing procedures can adjust for time-dependent risk factors. The analysis provides evidence that HIV-1 is more infectious than HIV-2.

DATE/PLACE: Thursday, November 23, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Analysis of the Growth Curve Model Using Quasi-Least Squares
SPEAKER: N. Rao Chaganty
Mathematics and Statistics, Old Dominion University


DATE/PLACE: Tuesday, November 21, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Joint Research Seminar
TITLE: Comparison of HIV Type-Specific Infectivities from Competing Risks Failure Time Data
SPEAKER: Peter Gilbert
Biostatistics, Harvard School of Public Health
To assist in the design of HIV vaccines, it is helpful to know if and how various HIV genotypes and phenotypes differ in their infectivity as defined by the per exposure transmission probability. For male-to-female sexual exposure, this question is adressed for HIV-1 versus HIV-2 through analysis of standard competing risks failure time data from a 15 year prospective cohort study of female commercial sex workers in Dakar, Senegal. Estimation of the HIV-1/HIV-2 infectivity ratio over time is based on nonparametric estimation of the HIV-1/HIV-2 infection hazard ratio over time adjusted by estimates of the HIV-1/HIV-2 prevalence ratio in the infected exposing male partner population. Hypothesis testing is based on a test process given by a weighted difference of estimates of cumulative type-specific hazard rates adjusted for estimates of the HIV-1/HIV-2 partner prevalence ratio. Under proportional hazards assumptions, the estimation and testing procedures can adjust for time-dependent risk factors. The analysis provides evidence that HIV-1 is more infectious than HIV-2.

DATE/PLACE: Tuesday, November 14, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Minimax robust regression designs
SPEAKER: Prof. Julie Zhou
University of Victoria
This talk gives a review of classical regression designs for controlled experiments and discusses the need to study the corresponding robust designs. Two commonly used methods in robust statistics are minimax and infinitesimal approaches. We define robust design problems for approximately linear models with correlated errors using minimax approach. Since analytical (continuous) robust designs are usually hard to derive, we will introduce a simulated annealing algorithm to search for discrete robust designs. In many cases in which continuous robust designs have not been solved, discrete robust designs can be obtained by applying the annealing algorithm. Two examples will be given to show discrete robust designs.

DATE/PLACE: Thursday, November 9, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Clinical Trials Conduct and Roles of Trial Statisticians
SPEAKER: Yong Hao, MD, PhD
QLT Inc., Vancouver
Clinical trials should be conducted with adherence to the highest possible ethical and scientific standards so that the rights of the trial subjects are fully protected and the trial results reflect the true science. Commonly adopted administrative and operational structures necessary for the ethical and scientific conduct of clinical trials will be introduced and discussed in this presentation. Some operational details and the role of a trial statistician will also be discussed.


DATE/PLACE: Tuesday, November 7, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Statistical analysis of repeated measurements with informative censoring times
SPEAKER: Prof. Peter (Xuekun) Song
Mathematics and Statistics Department
York University
Incomplete repeated measurement data frequently arise in medical studies. In this situation, a problem that one may face and has recently attracted a lot of attention is that the incompleteness or missingness of repeated measurements is informative or related to the underlying variable of interest. To attack the problem, we propose some nonparametric and semiparametric methods, which are distribution free and can be easily implemented. The proposed methods are evaluated by numerical studies and applied to data from a clinical trial of adult schizophrenics.

DATE/PLACE: Tuesday, October 31, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Analysis of Correlated Survival Data
SPEAKER: Prof. Tianxi Cai
Department of Biostatistics
University of Washington, Seattle
Several situations give rise to correlated failure time data. For example, in family studies in genetic epidemiology, if patients are from the same family, they may be genetically related. The Cox proportional hazards model with a random effect has been proposed for the analysis of data which consist of a large number of small clusters of correlated failure time observations.

However, Cox's model may not fit the data well. A natural generalization of Cox's model is semi-parametric transformation model, under which an unknown transformation of the survival time is linearly related to the covariates. This class of regression models studied by Cheng et al. (1995, 1997), which includes the proportional hazards model, provides many useful alternatives to the Cox model in univariate survival analysis.

We consider semi-parametric transformation models with random effects for the analysis of the aforementioned correlated and possibly censored failure time observations. The inference procedures for the regression parameters and their large sample properties are presented. An alternative to this model is the marginal approach which does not impose any structure on the correlation. We model each individual failure time with subject- and cluster-specific time-dependent covariates using aformentioned linear transformation models, however no specific parametric correlation structure is imposed on the observations. Under this setting, regression methods are proposed to analyze such correlated observations. Furthermore, we showed how to construct pointwise and simultaneous confidence intervals for survival function of subjects with a specific set of covariates are developed. An ad hoc model selection procedure is also given.


DATE/PLACE: Thursday, October 26, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Nonparametric and parametric analysis of right-truncated data
SPEAKER: Milena Simic
Mathematics & Statistics, SFU
In North American AIDS surveillance centers there are frequently substantial delays between the time an AIDS case is diagnosed (initial event) and the time the same case is reported. These reporting delays create problems in monitoring the epidemic so the estimation of the true number of persons who are diagnosed with AIDS is of interest. Since an AIDS case is identified only at the time it is reported (retrospectively ascertained), the reporting delay is right-truncated at the highest delay that can be possibly observed in the reporting process (from the first AIDS case diagnosed to the present). The non-parametric actuarial method is traditionally used for adjusting observed numbers to account for diagnosed but not reported AIDS cases.

In this presentation, the grouped parametric maximum likelihood approach to estimating the distribution of the reporting delay is discussed. The exact grouped likelihood method is developed for the Weibull and log-logistic distributions and applied to the Canadian AIDS surveillance data. To compare the performance of parametric and non-parametric adjustments, a few data sets were simulated and truncated at four different points in time. The two methods are found to agree for sufficiently large truncation times, and for early truncation times parametric estimation is found to provide valuable insight into the performance of non-parametric estimation.

Suggestions are made on how parametric estimation of delay can be used as a diagnostic measure for the appropriateness of the use of the non-parametric approach.


DATE/PLACE: Tuesday, October 24, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: How do I love thee? Let me count the ways:
Counting elusive populations using capture-recapture methods.
SPEAKER: Prof. Carl Schwarz
Department of Statistics and Actuarial Science
Simon Fraser University
Capture-recapture methods have a long history of being used to estimate the abundance of animal populations. But they can be used in many other situations as well. This talk will present an overview of the theory of capture-recapture methods illustrated with applications to such populations as the number of love poems penned by a poet, the number of people served by a health district, the number of plants in a field, and the number of taxi cabs in a large city.

DATE/PLACE: Tuesday, October 17, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Empirical Envelope MLE
SPEAKER: Prof. Mai Zhou
Dept of Statistics
University of Kentucky

Suppose the treatment and placebo survival distributions follow a proportional hazards model only after an unknown shift in location. How shall we estimate the shift as well as the proportion in hazards? This motivates the envelope MLE method.

The nonparametric maximum likelihood estimators (NPMLE) and the empirical likelihood ratio statistic sometimes do not exist in semiparametric settings. However, the NPMLE often exist in an enlarged parameter space. We propose to gradually shrink the enlarged parameter space by putting more and more constraints. This results the envelope MLE. The approach is a counter part of the sieve MLE (Grenander 1981).

We shall present one case in detail: 1). location problem where several samples are from the same (unknown) distribution except different locations. We shall briefly mention other models that can be treated by envelope MLE.

A Wilks type theorem for the empirical envelope likelihood ratio statistic and the asymptotic distribution of the location estimator is provided.

DATE/PLACE: Thursday, October 12, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Research Seminar
TITLE: Methods for Calculating the Size Distribution of an Epidemic: The Problem of Heterogeneity
SPEAKER: Steve Marion
Department of Health Care & Epidemiology
UBC
The size of a communicable disease epidemic is the cumulative number of people infected. The objective is to compute the size distribution as a function of time. This is work is ongoing. In the first part of the talk I will outline what has been accomplished so far. In the second part I will describe the main obstacle to application of existing methods in real world situations. Ideas from the audience for overcoming this obstacle would be most welcome.

Setting up models that capture the essential features of the dynamics of disease transmission is relatively easy. Simulations indicate that simple Markov process models work well. However, computing the implications of such models is surprisingly difficult. With strong assumptions, namely a closed population and homogeneous mixing (the probability of transmission from a given infected person to a susceptible in a small interval of time is the same for all susceptibles) efficient computation of the evolution of the epidemic is possible, and I will describe several related algorithms. An invariance theorem regarding the final size distribution makes it possible in principle to extend these methods to a much more general class of models that do not require the population to be closed and that incorporate realistic levels of heterogeneity in mixing. However, the feasibility of such computations is severely restricted by the necessity of storing the entire distribution of the state of the process, even though in the end one would be satisfied with knowing just the mean and the variance of the size. To illustrate the problem, I will present a model for transmission of hepatitis B in areas of high endemicity. Because of particular features of the epidemiology of this disease, computations for a realistic model are actually possible.

DATE/PLACE: Tuesday, October 10, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Robust inference for the simple linear regression and location models.
SPEAKER: Prof. Jorge Adrover
Department of Statistics
University of British Columbia

We propose robust confidence intervals and p-values for linear combinations of the parameters of a simple linear regression model. We search for robust procedures which are stable and yet informative. For instance, in the case of confidence intervals, we wish to construct intervals that are stable in the sense of achieving coverages near the nominal one even in the presence of outliers and other departures from the parametric model. Moreover, we wish to obtain intervals which are informative in the sense of having relatively short lengths. The problem of getting stable and informative confidence intervals has deserved attention in the literature (see Barnett and Lewis, Outliers in Statistical Data, 1994, p. 74 and references therein). However, the insofar seem to neglect the bias of the estimator which turns out to be crucial to accomplish stable confidence intervals.

To achieve the goals of stability and informativeness with our approach we need robust point estimates with the following properties: (1) asymptotic normality under general conditions; (2) known asymptotic bias bounds (for the intercept and slope parameters).

We consider some median-based estimates of the slope, for instance, Brown and Mood's estimate (1951), Siegel's repeated median of slopes (1982) and Theil and Sen's pairwise median of slopes (1951, 1968). We show that, to some extent, these estimates satisfy the requirements (1) and (2) above. We show that the proposed robust confidence intervals constitute an improvement over the intervals constructed around a robust point estimate using its asymptotic distribution under the ``target'' parametric model.

The location-scale model is also considered. In this case, we may take advantage of some additional information on the sign of the bias so as to yield shorter confidence intervals.


DATE/PLACE: Thursday, September 28, 2000, 16:00
LSK 301 (formerly CSCI 301)
6356 Agricultural Road, UBC
TYPE: Journal Club Session
TITLE: An introduction to instrumental variables for epidemiologists
LEADER: Paul Gustafson
Department of Statistics
UBC
Although instrumental variable (IV) methods are widely used in some fields (notably Economics), most statisticians (including the discussion leader) are not familiar with them. After discussing the basic notion of an instrument, we will work through the paper's central example involving non-compliance in a randomized trial. Then we will try to understand some of the author's remarks about the use of IV methods in more complex situations.

Note: The paper is available for copying in the UBC Statistics Mail Room. The journal is in the UBC library's e-journal collection, so getting it on-line should be possible for most of you.

Read it in advance, so you can contribute to the discussion.


DATE/PLACE: Tuesday, September 26, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Ordered multivariate extremes
SPEAKER: Saralees Nadarajah
Dept of Statistics and Applied Probability
University of California, Santa Barbara

In recent years statistical extreme value theory has matured to such an extent to contribute usefully to the study of substantial real problems, particularly in the area of environmental extremes. Examples include the design of off-shore structures (Coles and Tawn, 1994) and the study of reservoir flood safety (Anderson and Nadarajah, 1993).

A fairly commonly occurring characteristic is that the variables whose extremes are of interest are ordered. In hydro-meteorology one thing that is of interest is the dependence of extreme values of d-hour rainfall over a range of values of d. One approach is to fit a multivariate extreme value distribution over that range. If X(d) denotes rainfall aggregated over d hours, and if d' > d then X(d) <= X(d') <= (d'/d) X(d) for all (X(d), X(d')), so an order restriction in the multivariate extreme value model is needed. Similar order restrictions arise in the study of the joint distributions of large hourly mean wind speeds and large wind gusts.

The aim of this talk is to develop multivariate extremal models and associated statistical procedures for vector observations whose components are subject to an order relationship. We consider only the bivariate case. The results are applied to the joint analysis of rainfall extremes corresponding to different durations.


PIMS / MITACS Seminar Series On
Computational Statistics and Data Mining

DATE/PLACE: Friday, September 22, 2000, 13:30
Leonard S. Klinck 301 (formerly Computer Science 301)
Note Time and Place
TITLE: Depth Tests of Symmetry and Regression
SPEAKER: Professor Peter J. Rousseeuw
Department of Mathematics and Computer Science
Universitaire Instelling Antwerpen, Belgium
 

DATE/PLACE: Thursday, September 21, 2000, 16:00
CICSR 208, 2366 Main Mall
Note Time and Place
TITLE: An Introduction to Regression Depth
SPEAKER: Professor Peter J. Rousseeuw
Department of Mathematics and Computer Science
Universitaire Instelling Antwerpen, Belgium

In this talk we introduce a notion of depth in the regression setting. It provides the `rank' of any line (plane), rather than ranks of observations or residuals. In simple regression we can compute the depth of any line by an O ( n log n ) algorithm. For any bivariate data set Zn of size n there exists a line with depth at least n/3. The largest depth in Zn can be used as a measure of linearity versus convexity. In both simple and multiple regression we consider the deepest fit, which generalizes the univariate median and is equivariant for monotone transformations of the response. Throughout, the errors may be skewed and non-identically distributed (e.g. heteroskedastic). We also construct depth-based regression quantiles. They estimate the quantiles of y given x, as do the L1-based regression quantiles, but can withstand the effect of leverage points. Using the concept of regression depth, we obtain some new results of discrete geometry.


DATE/PLACE: Tuesday, September 12 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Expanding the Capabilities of B-WISE
SPEAKER: Nathan Johnson
Department of Statistics
University of British Columbia

The B-WISE (Bayesian regression With Interactions and Smooth Effects) method of regression modelling was designed to improve upon other flexible regression schemes in the areas of model interpretability and ease of implementation. B-WISE models have been shown to have good predictive performance, and performance can be even further improved with a natural Bayesian model averaging scheme.

In this presentation I will outline some ways in which some of the constraints inherent in a B-WISE model can be relaxed, so that the technique can be used in more general situations and with even greater flexibility. We will see that much of the interpretability of B-WISE models is retained and implementation is still relatively straightforward.


DATE/PLACE: Thursday, August 17, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Models for the development of tumours in Neurofibromatosis 2
SPEAKER: Ryan Woods
Department of Statistics
University of British Columbia

Neurofibromatosis 2 (NF2) is a rare genetic disease that affects approximately 1 in 40000 people. Some of the characteristic features of this disease include the onset of multiple tumours on the cranial and spinal nerves, juvenile cataracts and hearing loss. Almost all affected individuals develop bilateral tumours of the schwann cells that line the vestibular nerves; these tumours are called as vestibular schwannomas (VS). Evidence from molecular genetic studies has suggested that a "2-hit" hypothesis is appropriate for the development of VS in patients with NF2; that is to say that a tumour cell develops from a normal schwann cell after the cell sustains two mutations to its genetic material. Several authors have proposed probabilistic models for this process and have shown that such models are consistent with incidence data for several different types of cancer.

We will discuss a selection of probabilistic models for a "2-hit" hypothesis and present the results from the fitting of such models to incidence data from NF2 patients. Molecular evidence does not exclude the possibility that additional hits are necessary for the development of VS. We will therefore discuss a "3-hit" model and compare this model's fit to both the data and to the fit of the "2-hit" models. Genotype-phenotype correlations have been reported in patients with NF2 and thus a model that incorporates a patient's genotype is presented. Finally, a bivariate model is proposed to estimate the distributions of the ages at onset of both the first and second VS. All of these models will be presented with minimal mathematical detail; emphasis will be on the application of such models to patient data and on the results.


DATE/PLACE: Thursday, July 20, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Approximate exact sampling: Towards the general application of Propp and Wilson's algorithm*
SPEAKER: Professor Chris Jennison
Statistics Group
School of Mathematics
University of Bath

Propp and Wilson's coupling from the past (CFTP) algorithm provides exact samples and, thus, an elegant alternative to convergence diagnostics for standard MCMC samplers. I shall explain how this method works and discuss some practicalities regarding its use in MCMC sampling.

Unfortunately the CFTP technique is only applicable when the distribution to be sampled possesses certain special properties. We propose a way to use the method's basic idea more generally and demonstrate that our algorithm works well in some quite challenging applications. Although our method is approximate, it comes with diagnostics to help assess and control the level of approximation.

* Joint work with:
Tine Moller-Sorensen
Department of Biostatistics
University of Copenhagen


DATE/PLACE: Thursday, June 15, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Using multinomial mixture models to cluster Internet traffic
SPEAKER: Murray Jorgensen
Department of Statistics
University of Waikato
Hamilton, New Zealand

The data sets in this analysis were two ~80MB files of individual packet headers representing 120 minutes of bidrectional Internet traffic. The number of time-stamped packet headers in the two files were 2,153,603 and 2,066,750. Each line corresponds to one packet, the information in each column is:
1. Timestamp in seconds [931406400 = 4pm]
2. Packet length
3. Protocol id [1=ICMP, 6=TCP, 17=UDP]
4. Flow id.

The flow id is a hex number such as 0d32d150 to associate packets with the same IP number, port, etc at origin and destination. I decided to cluster the TCP flows according to their packet length distribution. The packet length distribution for each flow was summarized into frequency counts for five packet length classes. Restricting attention to TCP flows containing 100 packets or more resulted in around 2000 frequency tables from each file.

In the talk I will describe my experience in fitting this data to finite mixtures of 5-category multinomial distributions using the EM algorithm. Certain strategies were important to avoid numerical difficulties: for example it was necessary to increase the dispersion of the component distributions in the early stages of the fitting. The final models fitted had 16 and 18 components.


DATE/PLACE: Tuesday, June 13, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Nonconjugate Bayesian Analysis of One-Parameter Item Response Models
SPEAKER: Malay Ghosh
Department of Statistics
University of Florida
We present a unified Bayesian approach for the analysis of one-parameter item response models. A necessary and sufficient condition is given for the propriety of posteriors under improper priors with nonidentifiable likelihoods. Posterior distributions for item and subject parameters may be improper when the sum of the binary responses for an item or subject takes its minimum or maximum possible value. When the item parameters have a flat prior but the item totals do not fall at a boundary value, we prove the propriety of the Bayesian joint posterior under some sufficient conditions on the joint (proper) distribution of the subject parameters. The methods are implemented using Markov chain Monte Carlo and illustrated with an example from a cross-over study comparing three medical treatments. Finally, we have shown how some of these results can be carried over to the analysis of matched pairs data.
SSC 2000: Biostatistics Workshop Sunday, June 4th
9:00am to 4:00pm
University of Ottawa
Montpetit Hall 202

See the conference web site for more information.

From January through April 2000, the Biostatistics Research Group seminar series topic is Genetics and the Life Sciences. This series has been organized by Charmaine Dean and Jinko Graham of the Department of Mathematics and Statistics, Simon Fraser University. Note the times and locations of the seminars, listed below in reverse chronological order.
DATE/PLACE: Tuesday, May 2, 16:00
LSK 301, UBC
(formerly CSCI 301, UBC)
6356 Agricultural Road
TITLE: Incremental Net Benefit in Randomized Clinical Trials*
SPEAKER: Professor Andrew R. Willan
Department of Clinical Epidemiology and Biostatistics, McMaster
and Centre for Evaluation of Medicines, St Joseph's Hospital

*Note: Joint work with:
Professor D. Y. Lin
Department of Biostatistics
University of Washington

There are three approaches to health economic evaluation for comparing two therapies. These are:
(i) cost minimization, in which one assumes or observes no difference in effectiveness;
(ii) incremental cost-effectiveness; and,
(iii) incremental net benefit.
The latter can be expressed either in units of effectiveness or costs. When analyzing patient-level data from a clinical trial, expressing incremental net benefit in units of cost allows the investigator to examine all three approaches in a single graph, complete with the corresponding statistical inferences. Furthermore, if costs and effectiveness are not censored, this can be achieved using common two-sample statistical procedures. The above will be illustrated using two examples, one with censoring and one without.
* Joint work with:
Professor D. Y. Lin
Department of Biostatistics
University of Washington

DATE/PLACE: Tuesday, April 11, 2000, 16:00
Angus 31
NOTE ROOM CHANGE
TITLE: Risk Management Opportunities at the Workers' Compensation Board
TYPE: Employment Opportunities and Recruiting Talk
SPEAKER: Ella Young
Risk Manager
Workers' Compensation Board of BC

PACIFIC NORTHWEST STATISTICS CONFERENCE Friday, April 7, 2000
University of British Columbia

See the conference web site for more information.

DATE/PLACE: Thursday, April 6, 15:30
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Salmon, Genetics, and Monte Carlo
SPEAKER: Eric Anderson
Interdisciplinary Program in Quantitative Ecology and Resource Management, University of Washington

Within the last decade, Monte Carlo methods, and in particular, Markov chain Monte Carlo techniques, have been useful in frequentist settings for computing likelihoods from complex stochastic models and in Bayesian contexts for simulating from unnormalized posterior distributions. In genetics, MCMC has been employed primarily for analyzing ``family-level" data on the one hand, or for making inference of parameters relevant to evolutionary time scales on the other hand. Only more recently have such methods been applied to inference in population genetics scenarios relevant to population management. I have implemented reversible-jump MCMC in a Bayesian approach for using multi-locus genetic data to determine whether a collection of salmon is a single, interbreeding population or a mixture of two or more separate, ``component" populations (for example, a wild population and a hatchery-raised population). The Bayesian approach yields posterior probabilities for the number of populations in the mixture, the allele frequencies in the component populations, and the population-of-origin of different individuals in the mixture. And, of course, the approach extends beyond salmon to other species. In the context of this inference problem I will provide some introductory background on Monte Carlo, MCMC, and reversible-jump MCMC, as well.


PLACE: Tuesday, April 4, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Transient Improvement Over Bayes Prediction Under Model Uncertainty
SPEAKER: Hubert Wong
Department of Statistics
U.B.C.
The online forecasting problem is to use the information we have up to time n to forecast the outcome at time n+1. If we have several candidate models for the distribution of the sequence of outcomes, then each one gives a forecast. So, to make our forecast, we choose one of them, or average over all of them. Existing criteria for obtaining the best choice or average are either ``empirical" or ``model-based". We explain the difference between these two types of criteria and introduce a new ``mongrel" criterion. It combines attractive features from the other two types. Our simulation results show that the mongrel criterion gives forecasts that are more accurate than the Bayes prediction approach does across a range of data-generating models.

DATE/PLACE: Tuesday, March 28, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Parametric Modelling of Point Process Data Arising from a Reaction Time Experiment
SPEAKER: John Braun
Department of Mathematics and Statistics
University of Winnipeg

Data from a reaction time experiment comes in the form of a bivariate point process realization. Flashes are presented to a subject according to a homogeneous Poisson process, and the subject responds by pressing a button each time the flash is seen. An objective of analysis of such data is to determine whether there are interaction effects due to pairs of consecutive flashes. We propose a simple parametric model for the eye-brain-hand system which underlies the data. In order to complete the specification, it is necessary to identify the probability density of the delay between the occurrence of a flash and the corresponding response. One way of making this identification is to examine the coherence and the cross-intensity function between the flashes and responses. Nonparametric estimators of point process intensity functions and coherence have been studied in a sequence of papers by Brillinger. These estimators exhibit the usual bias-variance trade-off. Choi and Hall (1999) have introduced data sharpening, a bias-reduction procedure for density estimation which reduces the order of magnitude of the bias while increasing the variance by a constant factor. We adapt this method to the point process problem, showing how it may be applied to the estimation of the intensity functions for one-dimensional stationary point processes as well as the coherence.

From these nonparametric estimates, it is possible to identify an adequate parametric model. Estimation of the mean delay and a parameter governing the above interaction effect can then proceed using maximum likelihood.

 

DATE/PLACE: Thursday, March 23, 15:30
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Biostatistical Methods for the Genetic Disease Neurofibromatosis
SPEAKER: Harry Joe
Statistics, UBC


DATE/PLACE: Tuesday, March 14, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Fast Computation of Depth Contours
SPEAKER: Raymond T. Ng
Department of Computer Science
U.B.C.

DATE/PLACE: Thursday, March 9, 15:30
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Finding Genes in Genomic Sequence: A Comparative Approach
SPEAKER: David Baillie
Biological Sciences and Molecular Biology and Biochemistry, SFU

DATE/PLACE: Tuesday, March 7, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Models for Two-state Disease Processes with Applications to Relapsing-Remitting Multiple Sclerosis.
SPEAKER: Jochen Brumm
Department of Statistics
UBC
In diseases like relapsing-remitting multiple sclerosis (MS), patients experience repeated transitions between symptom-free and symptomatic disease states (the symptomatic state is called an exacerbation). Analyses for this kind of data commonly ignore the information available on the second state (the lengths of the exacerbations, for example).

In this talk, we consider models that incorporate the second state into the analyses. The basic stochastic models are Markov chains, alternating renewal processes and marked point processes. For the Markov chains and alternating renewal process models, we consider simple fixed effects models as well as random effects models where the random effects are introduced to allow for heterogeneity between patients and correlation of data on one patient. For these models, the statistical inference is based on maximum likelihood. For the marked point process model, we use a generalized estimating equation approach.

We apply these models to a data set from a MS clinical trial. The aim of the analyses is to relate the available covariates to the disease process. We do not attempt a comprehensive analysis of the data set, rather the aim here is to see what can be achieved and which questions can be addressed with the different models.


DATE/PLACE: Tuesday, February 29, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: A Model Selection Approach to Partially Linear Regression
SPEAKER: Florentina Bunea
Department of Statistics
University of Washington
We extend the model selection approach proposed by Barron, Birgé and Massart (1999) for nonparametric regression to partially linear regression. That is, we consider the model Y = + f(T) + W, where X belongs to R^q, T belongs to R, W is the error independent of (X,T) and f is a function of unknown smoothness. This model has received considerable attention in the literature, and it is mostly used in cases in which the parameter of interest is the linear component, whereas the variables appearing in the nonlinear part are viewed as confounders, hence f is regarded as a nuisance parameter. We study this model in two different cases.

Case A. The number of covariates appearing in the linear part is a priori given, say q.

Case B. We have available q possible regressors for the linear part, but only an unknown (possibly much smaller) subset of them are relevant for Y, hence we would like to select it.

We propose a penalized least squares approach and, in both cases, we obtain finite sample upper bounds for the risk of the estimator and, as a consequence, the consistency of the estimator of f at the optimal nonparametric rate. We also discuss the distributional properties of hat(ß) in cases A and B and show that sqrt(n) consistency of hat(ß) can be achieved in both cases.


DATE/PLACE: Thursday, February 24, 15:30
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Statistical Modelling of Species Occurrences
SPEAKER: Fangliang He
Canadian Forest Service, Pacific Forestry Centre, Victoria

Because of the difficulty and high cost of conducting species surveys, at landscape or regional scales information on a species is usually limited to a map of their presence or absence from recording units in a specified time frame. Various species data at large scales are increasingly documented in this presence/absence format. These types of data have recently attracted a great deal of attention from ecologists and statisticians. In this talk I will briefly introduce three issues related with this type of data. 1) Prediction of species occurrence: This is basically to restore an image (geographical distribution of a species), or, say, to predict the occurrence of a species in an area based on known occurrence data in nearby areas. 2) Autologistic regression: To model species occurrence using explanatory variables, MCMC algorithm will be discussed. This technique is useful to model effects of climate changes on species distribution. 3) Estimating abundance from occurrence: In a classical occupancy problem, we throw N balls into M boxes and want to know the number of empty boxes. Here I am interested in the reverse problem: how many balls are thrown given M boxes and u empty boxes. Solutions to this problem will be used to estimate population abundances for 800 species in a 50 ha (500x1000 m) tropical rainforest of Malaysia in which the forest plot is divided into a grid system. Each grid cell is considered as a box and the status of the occupancy of each cell is known.


DATE/PLACE: WEDNESDAY, February 23, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Atmospheric Data, Nonparametric Methods and Bayesian Interpretations
SPEAKER: Claudia Tebaldi
Geophysical Statistics Project
Climate and Global Dynamics Division
National Center for Atmospheric Research
Boulder, CO 80302-3000
The motivation for this work is from two seemingly unrelated problems in atmospheric science: forecasting clear air turbuluence and quantifying the dynamical properties of a climate model.

The common denominator of these two very different applications is the use of nonparametric tools -- MARS (Multivariate Adaptive Regression Splines) and Neural Networks -- to deal with high dimensional data. The essence of each of these methods is data-driven model selection through cross-validation and a useful way to understand such methods is in a Bayesian context.

The Bayesian analogue, evaluates a posterior probability distribution over the space of models based on the data AND a prior probability distribution on that space. The fundamental question is "what prior is implicitly assumed by model selection criteria like GCV (Generalized Cross-Validation) and by adaptive tools like MARS and NNs?"

This question can be anwered in two ways. One is by following the mainstream of the Bayesian model selection industry. The other adopts a 'reverse engineering' perspective in reconstructing the prior assumptions via simulation with respect to the MARS and NN estimators.


DATE/PLACE: Tuesday, February 15, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Nonlinear Mixed-Effect Models with Missing Time-Dependent Covariates, with Application to HIV Viral Dynamics
SPEAKER: Lang Wu
Department of Biostatistics
Harvard School of Public Health
The study of HIV viral dynamics is an important area in AIDS research. Nonlinear mixed-effect models have been proposed for modeling viral dynamic processes. A challenging problem in the modeling is to identify repeatedly measured, but possibly missing, immunologic or virologic markers (covariates) for viral dynamic parameters. For missing time-dependent covariates in nonlinear mixed-effect models, the commonly used methods may give misleading results. We propose a multiple imputation method which imputes the missing data at the individual level but can pool information across individuals. In situations where the covariate trajectories exhibit distinctive and important patterns, we also propose an alternative imputation method based on modeling the covariate processes. We compare various methods by Monte Carlo simulations, and find that the proposed multiple imputation (MI) method performs the best in terms of biasness and mean-squared errors in the estimates of covariate coefficients. A real dataset is analyzed based on the proposed methods.

DATE/PLACE: Thursday, February 10, 15:30
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Maximum Likelihood Estimation for Seed and Pollen Dispersal Parameters
SPEAKER: Beatrix Jones
Statistics, University of Washington

The direction and distance that seeds and pollen travel are important demographic parameters for plant populations. Estimates of the distributions characterizing seed and pollen movement are important in understanding the fine scale genetic structure of a population and the ecological factors that are relevant to its persistance. In addition to conservation of populations, understanding these factors can aid in gauging the impact of the release of genetically engineered or non-native organisms. I present a method for inferring the maximum likelihood estimates of parametric distributions for seed and pollen dispersal distances using genetic data and spatial information at two consecutive generations. Likelihoods are obtained by using Monte Carlo methods to approximate a sum over a large number of discrete latent variables.


DATE/PLACE: Thursday, February 3, 15:30 CHANGED FROM THURSDAY JANUARY 27
Department of Mathematics and Statistics
Simon Fraser University
Room K9509
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: A Linkage Disequilibrium (LD) Study in the Newfoundland Population Reduces the Bardet-Biedl Syndrome (I) (BBS1) Interval to 1 cM
SPEAKERS: Terry-Lynn Young, Medicine, Memorial University
& William S. Davidson, Molecular Biology and Biochemistry and Dean of Science, SFU

The direction and distance that seeds and pollen travel are important demographic parameters for plant populations. Estimates of the distributions characterizing seed and pollen movement are important in understanding the fine scale genetic structure of a population and the ecological factors that are relevant to its persistance. In addition to conservation of populations, understanding these factors can aid in gauging the impact of the release of genetically engineered or non-native organisms. I present a method for inferring the maximum likelihood estimates of parametric distributions for seed and pollen dispersal distances using genetic data and spatial information at two consecutive generations. Likelihoods are obtained by using Monte Carlo methods to approximate a sum over a large number of discrete latent variables.

DATE/PLACE: Tuesday, February 1, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Statistical Methods for Assessing Habitat Preferences
SPEAKER: Dieter Ayers
Computing & Telecom Services
University of Northern B.C.
It is often the case that samples are taken in a non-random fashion. This thesis attempts to define a methodology by which some analysis can be performed on a specific kind of non-random sample. In studies of wildlife behaviour, a common method of sampling involves tagging an animal and relocating it in subsequent time periods. Considering the number or type of animal present at sampling locations as a random sample is erroneous, as the locations were chosen by the animal and not in a random fashion. Further, with data such as this is not possible to draw conclusions about areas where no animals were observed, as it is unknown whether these areas were truly free of animals.

We treat the observed data as conditional on the presence of an animal, and then use a Bayesian approach to estimate the probability of finding animals in any given location. This results in a method that allows for mapping the propensity of a certain area to be chosen by an animal in the future.


DATE/PLACE: Tuesday, January 18, 2000, 16:00
Leonard S. Klinck 301
(formerly Computer Science 301),
6356 Agricultural Road
TITLE: Over-dispersion in Count Data
SPEAKER: A. H. EL-Shaarawi
National Water Research Institute
Burlington, Ontario, Canada L7R 4A6
Analysis of data in the environmental sciences often leads to a phenomenon referred to as over-dispersion. This occurrence is particularly common in count data, resulting in larger residual variations than the expected model variation. Approaches for detecting and estimating over-dispersion will be presented and applied to a number of environmental data sets.

DATE/PLACE: Thursday, January 13, 15:30
BC Cancer Research Centre's Lecture Theatre
601 W.10th Ave.
NOTE TIME AND PLACE
TYPE: Genetics and the Life Sciences
TITLE: Bioinformatics at the Genome Sequence Centre and Beyond
SPEAKER: Steven Jones
Genome Sequence Centre, B.C. Cancer Agency
The term bioinformatics has arisen to encompass the growing field of computational biology. There has been a recent explosion in the ability to determine new DNA and protein sequences. However, the rate at which experimental biology is able to determine function functions for these sequences has remained expensive and time consuming. This has meant that computational prediction is now the only tool used in deriving meaning from the majority of biological sequences. The genes of complex organisms, such as humans can differ in their expression both temporally and spatially and will be under the influence of many cellular factors. In addition, the process of identifying the existence of genes and their correct gene structure remains problematic, let alone the putative function of any resulting protein. I will describe some of the problems which bioinformatics is trying to address in the analysis of DNA sequence as well as some of the other bioinformatic problems being addressed at the Genome Sequence Centre.

a place of mind, The University of British Columbia

Department of Statistics

333-6356 Agricultural Road
Vancouver, BC, V6T 1Z2
Tel: 604.822.0570
Fax: 604.822.6960
E-mail: [UNIT E-MAIL]

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia