Department Seminars 2000
| DATE/PLACE: | Tuesday, November 28, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | CAST: A Computer-based Resource for Teaching Statistics |
| SPEAKER: | Doug Stirling Institute of Information Sciences and Technology Massey University New Zealand |
|
The wide availability of fast computers has had enormous impact on introductory statistics courses. Computers initially allowed students to apply numerical and graphical methods to realistic data sets, thereby reducing the emphasis on numerical algorithms and formulae. More recently, it has been recognised that computers also have great potential for teaching statistical concepts, adding to their role as sophisticated calculator. Programs such as Minitab and SAS lack features required for teaching concepts. For example, we might want to ...
While data-analysis programs can perform some of the above, they can rarely make the mechanism clear to students. Models, sampling and empirical distributions must be first-class citizens in software used for teaching statistical concepts. This talk will demonstrate CAST, a computer-based resource that is designed to teach statistical concepts. CAST is accessed using a web browser and contains both expository text and over 300 small programs (applets) that do most of the teaching. The applets share an extensive framework of code that is designed for teaching statistical concepts. CAST can be described as a textbook with dynamic, interactive diagrams. Since students must interact with each page, it is claimed that their attention is retained and learning is improved. Even in lectures, the diagrams are effective ways to teach most statistical concepts. |
|
| DATE/PLACE: | Tuesday, November 21, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Comparison of HIV type-specific infectivities from competing risks failure time data |
| SPEAKER: | Prof. Peter Gilbert Department of Biostatistics Harvard University, Boston |
| To assist in the design of HIV vaccines, it is helpful to know if and how various HIV genotypes and phenotypes differ in their infectivity as defined by the per exposure transmission probability. For male-to-female sexual exposure, this question is adressed for HIV-1 versus HIV-2 through analysis of standard competing risks failure time data from a 15 year prospective cohort study of female commercial sex workers in Dakar, Senegal. Estimation of the HIV-1/HIV-2 infectivity ratio over time is based on nonparametric estimation of the HIV-1/HIV-2 infection hazard ratio over time adjusted by estimates of the HIV-1/HIV-2 prevalence ratio in the infected exposing male partner population. Hypothesis testing is based on a test process given by a weighted difference of estimates of cumulative type-specific hazard rates adjusted for estimates of the HIV-1/HIV-2 partner prevalence ratio. Under proportional hazards assumptions, the estimation and testing procedures can adjust for time-dependent risk factors. The analysis provides evidence that HIV-1 is more infectious than HIV-2. | |
| DATE/PLACE: | Thursday, November 23, 2000, 16:00 LSK 301 (formerly CSCI 301) 6356 Agricultural Road, UBC |
| TYPE: | Research Seminar |
| TITLE: | Analysis of the Growth Curve Model Using Quasi-Least Squares |
| SPEAKER: | N. Rao Chaganty Mathematics and Statistics, Old Dominion University |
|
|
|
| DATE/PLACE: | Tuesday, November 21, 2000, 16:00 LSK 301 (formerly CSCI 301) 6356 Agricultural Road, UBC |
| TYPE: | Joint Research Seminar |
| TITLE: | Comparison of HIV Type-Specific Infectivities from Competing Risks Failure Time Data |
| SPEAKER: | Peter Gilbert Biostatistics, Harvard School of Public Health |
| To assist in the design of HIV vaccines, it is helpful to know if and how various HIV genotypes and phenotypes differ in their infectivity as defined by the per exposure transmission probability. For male-to-female sexual exposure, this question is adressed for HIV-1 versus HIV-2 through analysis of standard competing risks failure time data from a 15 year prospective cohort study of female commercial sex workers in Dakar, Senegal. Estimation of the HIV-1/HIV-2 infectivity ratio over time is based on nonparametric estimation of the HIV-1/HIV-2 infection hazard ratio over time adjusted by estimates of the HIV-1/HIV-2 prevalence ratio in the infected exposing male partner population. Hypothesis testing is based on a test process given by a weighted difference of estimates of cumulative type-specific hazard rates adjusted for estimates of the HIV-1/HIV-2 partner prevalence ratio. Under proportional hazards assumptions, the estimation and testing procedures can adjust for time-dependent risk factors. The analysis provides evidence that HIV-1 is more infectious than HIV-2.
|
|
| DATE/PLACE: | Tuesday, November 14, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Minimax robust regression designs |
| SPEAKER: | Prof. Julie Zhou University of Victoria |
| This talk gives a review of classical regression designs for controlled experiments and discusses the need to study the corresponding robust designs. Two commonly used methods in robust statistics are minimax and infinitesimal approaches. We define robust design problems for approximately linear models with correlated errors using minimax approach. Since analytical (continuous) robust designs are usually hard to derive, we will introduce a simulated annealing algorithm to search for discrete robust designs. In many cases in which continuous robust designs have not been solved, discrete robust designs can be obtained by applying the annealing algorithm. Two examples will be given to show discrete robust designs. | |
| DATE/PLACE: | Thursday, November 9, 2000, 16:00 LSK 301 (formerly CSCI 301) 6356 Agricultural Road, UBC |
| TYPE: | Research Seminar |
| TITLE: | Clinical Trials Conduct and Roles of Trial Statisticians |
| SPEAKER: | Yong Hao, MD, PhD QLT Inc., Vancouver |
| Clinical trials should be conducted with adherence to the highest possible ethical and scientific standards so that the rights of the trial subjects are fully protected and the trial results reflect the true science. Commonly adopted administrative and operational structures necessary for the ethical and scientific conduct of clinical trials will be introduced and discussed in this presentation. Some operational details and the role of a trial statistician will also be discussed.
|
|
| DATE/PLACE: | Tuesday, November 7, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Statistical analysis of repeated measurements with informative censoring times |
| SPEAKER: | Prof. Peter (Xuekun) Song Mathematics and Statistics Department York University |
| Incomplete repeated measurement data frequently arise in medical studies. In this situation, a problem that one may face and has recently attracted a lot of attention is that the incompleteness or missingness of repeated measurements is informative or related to the underlying variable of interest. To attack the problem, we propose some nonparametric and semiparametric methods, which are distribution free and can be easily implemented. The proposed methods are evaluated by numerical studies and applied to data from a clinical trial of adult schizophrenics. | |
| DATE/PLACE: | Tuesday, October 24, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | How do I love thee? Let me count the ways: Counting elusive populations using capture-recapture methods. |
| SPEAKER: | Prof. Carl Schwarz Department of Statistics and Actuarial Science Simon Fraser University |
| Capture-recapture methods have a long history of being used to estimate the abundance of animal populations. But they can be used in many other situations as well. This talk will present an overview of the theory of capture-recapture methods illustrated with applications to such populations as the number of love poems penned by a poet, the number of people served by a health district, the number of plants in a field, and the number of taxi cabs in a large city. | |
| DATE/PLACE: | Tuesday, October 17, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Empirical Envelope MLE |
| SPEAKER: | Prof. Mai Zhou Dept of Statistics University of Kentucky |
|
Suppose the treatment and placebo survival distributions follow a proportional hazards model only after an unknown shift in location. How shall we estimate the shift as well as the proportion in hazards? This motivates the envelope MLE method. The nonparametric maximum likelihood estimators (NPMLE) and the empirical likelihood ratio statistic sometimes do not exist in semiparametric settings. However, the NPMLE often exist in an enlarged parameter space. We propose to gradually shrink the enlarged parameter space by putting more and more constraints. This results the envelope MLE. The approach is a counter part of the sieve MLE (Grenander 1981). We shall present one case in detail: 1). location problem where several samples are from the same (unknown) distribution except different locations. We shall briefly mention other models that can be treated by envelope MLE. A Wilks type theorem for the empirical envelope likelihood ratio statistic and the asymptotic distribution of the location estimator is provided. |
|
| DATE/PLACE: | Tuesday, October 10, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Robust inference for the simple linear regression and location models. |
| SPEAKER: | Prof. Jorge Adrover Department of Statistics University of British Columbia |
|
We propose robust confidence intervals and p-values for linear combinations of the parameters of a simple linear regression model. We search for robust procedures which are stable and yet informative. For instance, in the case of confidence intervals, we wish to construct intervals that are stable in the sense of achieving coverages near the nominal one even in the presence of outliers and other departures from the parametric model. Moreover, we wish to obtain intervals which are informative in the sense of having relatively short lengths. The problem of getting stable and informative confidence intervals has deserved attention in the literature (see Barnett and Lewis, Outliers in Statistical Data, 1994, p. 74 and references therein). However, the insofar seem to neglect the bias of the estimator which turns out to be crucial to accomplish stable confidence intervals. To achieve the goals of stability and informativeness with our approach we need robust point estimates with the following properties: (1) asymptotic normality under general conditions; (2) known asymptotic bias bounds (for the intercept and slope parameters). We consider some median-based estimates of the slope, for instance, Brown and Mood's estimate (1951), Siegel's repeated median of slopes (1982) and Theil and Sen's pairwise median of slopes (1951, 1968). We show that, to some extent, these estimates satisfy the requirements (1) and (2) above. We show that the proposed robust confidence intervals constitute an improvement over the intervals constructed around a robust point estimate using its asymptotic distribution under the ``target'' parametric model. The location-scale model is also considered. In this case, we may take advantage of some additional information on the sign of the bias so as to yield shorter confidence intervals. |
|
| DATE/PLACE: | Tuesday, September 26, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Ordered multivariate extremes |
| SPEAKER: | Saralees Nadarajah Dept of Statistics and Applied Probability University of California, Santa Barbara |
|
In recent years statistical extreme value theory has matured to such an extent to contribute usefully to the study of substantial real problems, particularly in the area of environmental extremes. Examples include the design of off-shore structures (Coles and Tawn, 1994) and the study of reservoir flood safety (Anderson and Nadarajah, 1993). A fairly commonly occurring characteristic is that the variables whose extremes are of interest are ordered. In hydro-meteorology one thing that is of interest is the dependence of extreme values of d-hour rainfall over a range of values of d. One approach is to fit a multivariate extreme value distribution over that range. If X(d) denotes rainfall aggregated over d hours, and if d' > d then X(d) <= X(d') <= (d'/d) X(d) for all (X(d), X(d')), so an order restriction in the multivariate extreme value model is needed. Similar order restrictions arise in the study of the joint distributions of large hourly mean wind speeds and large wind gusts. The aim of this talk is to develop multivariate extremal models and associated statistical procedures for vector observations whose components are subject to an order relationship. We consider only the bivariate case. The results are applied to the joint analysis of rainfall extremes corresponding to different durations. |
|
PIMS / MITACS Seminar Series On
Computational Statistics and Data Mining
| DATE/PLACE: | Friday, September 22, 2000, 13:30 Leonard S. Klinck 301 (formerly Computer Science 301) Note Time and Place |
| TITLE: | Depth Tests of Symmetry and Regression |
| SPEAKER: | Professor Peter J. Rousseeuw Department of Mathematics and Computer Science Universitaire Instelling Antwerpen, Belgium |
| DATE/PLACE: | Thursday, September 21, 2000, 16:00 CICSR 208, 2366 Main Mall Note Time and Place |
| TITLE: | An Introduction to Regression Depth |
| SPEAKER: | Professor Peter J. Rousseeuw Department of Mathematics and Computer Science Universitaire Instelling Antwerpen, Belgium |
|
In this talk we introduce a notion of depth in the regression setting. It provides the `rank' of any line (plane), rather than ranks of observations or residuals. In simple regression we can compute the depth of any line by an O ( n log n ) algorithm. For any bivariate data set Zn of size n there exists a line with depth at least n/3. The largest depth in Zn can be used as a measure of linearity versus convexity. In both simple and multiple regression we consider the deepest fit, which generalizes the univariate median and is equivariant for monotone transformations of the response. Throughout, the errors may be skewed and non-identically distributed (e.g. heteroskedastic). We also construct depth-based regression quantiles. They estimate the quantiles of y given x, as do the L1-based regression quantiles, but can withstand the effect of leverage points. Using the concept of regression depth, we obtain some new results of discrete geometry. |
|
| DATE/PLACE: | Tuesday, September 12 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Expanding the Capabilities of B-WISE |
| SPEAKER: | Nathan Johnson Department of Statistics University of British Columbia |
|
The B-WISE (Bayesian regression With Interactions and Smooth Effects) method of regression modelling was designed to improve upon other flexible regression schemes in the areas of model interpretability and ease of implementation. B-WISE models have been shown to have good predictive performance, and performance can be even further improved with a natural Bayesian model averaging scheme. In this presentation I will outline some ways in which some of the constraints inherent in a B-WISE model can be relaxed, so that the technique can be used in more general situations and with even greater flexibility. We will see that much of the interpretability of B-WISE models is retained and implementation is still relatively straightforward. |
|
| DATE/PLACE: | Thursday, August 17, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Models for the development of tumours in Neurofibromatosis 2 |
| SPEAKER: | Ryan Woods Department of Statistics University of British Columbia |
|
Neurofibromatosis 2 (NF2) is a rare genetic disease that affects approximately 1 in 40000 people. Some of the characteristic features of this disease include the onset of multiple tumours on the cranial and spinal nerves, juvenile cataracts and hearing loss. Almost all affected individuals develop bilateral tumours of the schwann cells that line the vestibular nerves; these tumours are called as vestibular schwannomas (VS). Evidence from molecular genetic studies has suggested that a "2-hit" hypothesis is appropriate for the development of VS in patients with NF2; that is to say that a tumour cell develops from a normal schwann cell after the cell sustains two mutations to its genetic material. Several authors have proposed probabilistic models for this process and have shown that such models are consistent with incidence data for several different types of cancer. We will discuss a selection of probabilistic models for a "2-hit" hypothesis and present the results from the fitting of such models to incidence data from NF2 patients. Molecular evidence does not exclude the possibility that additional hits are necessary for the development of VS. We will therefore discuss a "3-hit" model and compare this model's fit to both the data and to the fit of the "2-hit" models. Genotype-phenotype correlations have been reported in patients with NF2 and thus a model that incorporates a patient's genotype is presented. Finally, a bivariate model is proposed to estimate the distributions of the ages at onset of both the first and second VS. All of these models will be presented with minimal mathematical detail; emphasis will be on the application of such models to patient data and on the results. |
|
| DATE/PLACE: | Thursday, July 20, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Approximate exact sampling: Towards the general application of Propp and Wilson's algorithm* |
| SPEAKER: | Professor Chris Jennison Statistics Group School of Mathematics University of Bath |
|
Propp and Wilson's coupling from the past (CFTP) algorithm provides exact samples and, thus, an elegant alternative to convergence diagnostics for standard MCMC samplers. I shall explain how this method works and discuss some practicalities regarding its use in MCMC sampling. Unfortunately the CFTP technique is only applicable when the distribution to be sampled possesses certain special properties. We propose a way to use the method's basic idea more generally and demonstrate that our algorithm works well in some quite challenging applications. Although our method is approximate, it comes with diagnostics to help assess and control the level of approximation. * Joint work with:Tine Moller-Sorensen Department of Biostatistics University of Copenhagen |
|
| DATE/PLACE: | Thursday, June 15, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Using multinomial mixture models to cluster Internet traffic |
| SPEAKER: | Murray Jorgensen Department of Statistics University of Waikato Hamilton, New Zealand |
|
The data sets in this analysis were two ~80MB files of individual packet headers representing 120 minutes of bidrectional Internet traffic. The number of time-stamped packet headers in the two files were 2,153,603 and 2,066,750. Each line corresponds to one packet, the information in each column is: The flow id is a hex number such as 0d32d150 to associate packets with the same IP number, port, etc at origin and destination. I decided to cluster the TCP flows according to their packet length distribution. The packet length distribution for each flow was summarized into frequency counts for five packet length classes. Restricting attention to TCP flows containing 100 packets or more resulted in around 2000 frequency tables from each file. In the talk I will describe my experience in fitting this data to finite mixtures of 5-category multinomial distributions using the EM algorithm. Certain strategies were important to avoid numerical difficulties: for example it was necessary to increase the dispersion of the component distributions in the early stages of the fitting. The final models fitted had 16 and 18 components. |
|
| DATE/PLACE: | Tuesday, June 13, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Nonconjugate Bayesian Analysis of One-Parameter Item Response Models |
| SPEAKER: | Malay Ghosh Department of Statistics University of Florida |
| We present a unified Bayesian approach for the analysis of one-parameter item response models. A necessary and sufficient condition is given for the propriety of posteriors under improper priors with nonidentifiable likelihoods. Posterior distributions for item and subject parameters may be improper when the sum of the binary responses for an item or subject takes its minimum or maximum possible value. When the item parameters have a flat prior but the item totals do not fall at a boundary value, we prove the propriety of the Bayesian joint posterior under some sufficient conditions on the joint (proper) distribution of the subject parameters. The methods are implemented using Markov chain Monte Carlo and illustrated with an example from a cross-over study comparing three medical treatments. Finally, we have shown how some of these results can be carried over to the analysis of matched pairs data. | |
| SSC 2000: Biostatistics Workshop | Sunday, June 4th 9:00am to 4:00pm University of Ottawa Montpetit Hall 202 See the conference web site for more information. |
From January through April 2000, the Biostatistics Research Group seminar series topic is Genetics and the Life Sciences. This series has been organized by Charmaine Dean and Jinko Graham of the Department of Mathematics and Statistics, Simon Fraser University. Note the times and locations of the seminars, listed below in reverse chronological order. |
| DATE/PLACE: | Tuesday, May 2, 16:00 LSK 301, UBC (formerly CSCI 301, UBC) 6356 Agricultural Road |
| TITLE: | Incremental Net Benefit in Randomized Clinical Trials* |
| SPEAKER: | Professor Andrew R. Willan Department of Clinical Epidemiology and Biostatistics, McMaster and Centre for Evaluation of Medicines, St Joseph's Hospital *Note: Joint work with: |
|
There are three approaches to health economic evaluation for comparing two therapies. These are: (i) cost minimization, in which one assumes or observes no difference in effectiveness; (ii) incremental cost-effectiveness; and, (iii) incremental net benefit. The latter can be expressed either in units of effectiveness or costs. When analyzing patient-level data from a clinical trial, expressing incremental net benefit in units of cost allows the investigator to examine all three approaches in a single graph, complete with the corresponding statistical inferences. Furthermore, if costs and effectiveness are not censored, this can be achieved using common two-sample statistical procedures. The above will be illustrated using two examples, one with censoring and one without. * Joint work with: Professor D. Y. Lin Department of Biostatistics University of Washington |
|
| DATE/PLACE: | Tuesday, April 11, 2000, 16:00 Angus 31 NOTE ROOM CHANGE |
| TITLE: | Risk Management Opportunities at the Workers' Compensation Board |
| TYPE: | Employment Opportunities and Recruiting Talk |
| SPEAKER: | Ella Young Risk Manager Workers' Compensation Board of BC |
| PACIFIC NORTHWEST STATISTICS CONFERENCE | Friday, April 7, 2000 University of British Columbia See the conference web site for more information. |
| DATE/PLACE: | Thursday, April 6, 15:30 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Salmon, Genetics, and Monte Carlo |
| SPEAKER: | Eric Anderson Interdisciplinary Program in Quantitative Ecology and Resource Management, University of Washington |
|
Within the last decade, Monte Carlo methods, and in particular, Markov chain Monte Carlo techniques, have been useful in frequentist settings for computing likelihoods from complex stochastic models and in Bayesian contexts for simulating from unnormalized posterior distributions. In genetics, MCMC has been employed primarily for analyzing ``family-level" data on the one hand, or for making inference of parameters relevant to evolutionary time scales on the other hand. Only more recently have such methods been applied to inference in population genetics scenarios relevant to population management. I have implemented reversible-jump MCMC in a Bayesian approach for using multi-locus genetic data to determine whether a collection of salmon is a single, interbreeding population or a mixture of two or more separate, ``component" populations (for example, a wild population and a hatchery-raised population). The Bayesian approach yields posterior probabilities for the number of populations in the mixture, the allele frequencies in the component populations, and the population-of-origin of different individuals in the mixture. And, of course, the approach extends beyond salmon to other species. In the context of this inference problem I will provide some introductory background on Monte Carlo, MCMC, and reversible-jump MCMC, as well. |
|
| PLACE: | Tuesday, April 4, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Transient Improvement Over Bayes Prediction Under Model Uncertainty |
| SPEAKER: | Hubert Wong Department of Statistics U.B.C. |
| The online forecasting problem is to use the information we have up to time n to forecast the outcome at time n+1. If we have several candidate models for the distribution of the sequence of outcomes, then each one gives a forecast. So, to make our forecast, we choose one of them, or average over all of them. Existing criteria for obtaining the best choice or average are either ``empirical" or ``model-based". We explain the difference between these two types of criteria and introduce a new ``mongrel" criterion. It combines attractive features from the other two types. Our simulation results show that the mongrel criterion gives forecasts that are more accurate than the Bayes prediction approach does across a range of data-generating models. | |
| DATE/PLACE: | Tuesday, March 28, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Parametric Modelling of Point Process Data Arising from a Reaction Time Experiment |
| SPEAKER: | John Braun Department of Mathematics and Statistics University of Winnipeg |
|
Data from a reaction time experiment comes in the form of a bivariate point process realization. Flashes are presented to a subject according to a homogeneous Poisson process, and the subject responds by pressing a button each time the flash is seen. An objective of analysis of such data is to determine whether there are interaction effects due to pairs of consecutive flashes. We propose a simple parametric model for the eye-brain-hand system which underlies the data. In order to complete the specification, it is necessary to identify the probability density of the delay between the occurrence of a flash and the corresponding response. One way of making this identification is to examine the coherence and the cross-intensity function between the flashes and responses. Nonparametric estimators of point process intensity functions and coherence have been studied in a sequence of papers by Brillinger. These estimators exhibit the usual bias-variance trade-off. Choi and Hall (1999) have introduced data sharpening, a bias-reduction procedure for density estimation which reduces the order of magnitude of the bias while increasing the variance by a constant factor. We adapt this method to the point process problem, showing how it may be applied to the estimation of the intensity functions for one-dimensional stationary point processes as well as the coherence. From these nonparametric estimates, it is possible to identify an adequate parametric model. Estimation of the mean delay and a parameter governing the above interaction effect can then proceed using maximum likelihood. |
|
| DATE/PLACE: | Thursday, March 23, 15:30 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Biostatistical Methods for the Genetic Disease Neurofibromatosis |
| SPEAKER: | Harry Joe Statistics, UBC |
| DATE/PLACE: | Tuesday, March 14, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Fast Computation of Depth Contours |
| SPEAKER: | Raymond T. Ng Department of Computer Science U.B.C. |
| DATE/PLACE: | Thursday, March 9, 15:30 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Finding Genes in Genomic Sequence: A Comparative Approach |
| SPEAKER: | David Baillie Biological Sciences and Molecular Biology and Biochemistry, SFU |
| DATE/PLACE: | Tuesday, March 7, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Models for Two-state Disease Processes with Applications to Relapsing-Remitting Multiple Sclerosis. |
| SPEAKER: | Jochen Brumm Department of Statistics UBC |
| In diseases like relapsing-remitting multiple sclerosis (MS), patients experience repeated transitions between symptom-free and symptomatic disease states (the symptomatic state is called an exacerbation). Analyses for this kind of data commonly ignore the information available on the second state (the lengths of the exacerbations, for example).
In this talk, we consider models that incorporate the second state into the analyses. The basic stochastic models are Markov chains, alternating renewal processes and marked point processes. For the Markov chains and alternating renewal process models, we consider simple fixed effects models as well as random effects models where the random effects are introduced to allow for heterogeneity between patients and correlation of data on one patient. For these models, the statistical inference is based on maximum likelihood. For the marked point process model, we use a generalized estimating equation approach. We apply these models to a data set from a MS clinical trial. The aim of the analyses is to relate the available covariates to the disease process. We do not attempt a comprehensive analysis of the data set, rather the aim here is to see what can be achieved and which questions can be addressed with the different models. |
|
| DATE/PLACE: | Tuesday, February 29, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | A Model Selection Approach to Partially Linear Regression |
| SPEAKER: | Florentina Bunea Department of Statistics University of Washington |
| We extend the model selection approach proposed by Barron, Birgé and Massart (1999) for nonparametric regression to partially linear regression. That is, we consider the model Y = Xß + f(T) + W, where X belongs to R^q, T belongs to R, W is the error independent of (X,T) and f is a function of unknown smoothness. This model has received considerable attention in the literature, and it is mostly used in cases in which the parameter of interest is the linear component, whereas the variables appearing in the nonlinear part are viewed as confounders, hence f is regarded as a nuisance parameter. We study this model in two different cases.
Case A. The number of covariates appearing in the linear part is a priori given, say q. Case B. We have available q possible regressors for the linear part, but only an unknown (possibly much smaller) subset of them are relevant for Y, hence we would like to select it. We propose a penalized least squares approach and, in both cases, we obtain finite sample upper bounds for the risk of the estimator and, as a consequence, the consistency of the estimator of f at the optimal nonparametric rate. We also discuss the distributional properties of hat(ß) in cases A and B and show that sqrt(n) consistency of hat(ß) can be achieved in both cases. |
|
| DATE/PLACE: | Thursday, February 24, 15:30 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Statistical Modelling of Species Occurrences |
| SPEAKER: | Fangliang He Canadian Forest Service, Pacific Forestry Centre, Victoria |
|
Because of the difficulty and high cost of conducting species surveys, at landscape or regional scales information on a species is usually limited to a map of their presence or absence from recording units in a specified time frame. Various species data at large scales are increasingly documented in this presence/absence format. These types of data have recently attracted a great deal of attention from ecologists and statisticians. In this talk I will briefly introduce three issues related with this type of data. 1) Prediction of species occurrence: This is basically to restore an image (geographical distribution of a species), or, say, to predict the occurrence of a species in an area based on known occurrence data in nearby areas. 2) Autologistic regression: To model species occurrence using explanatory variables, MCMC algorithm will be discussed. This technique is useful to model effects of climate changes on species distribution. 3) Estimating abundance from occurrence: In a classical occupancy problem, we throw N balls into M boxes and want to know the number of empty boxes. Here I am interested in the reverse problem: how many balls are thrown given M boxes and u empty boxes. Solutions to this problem will be used to estimate population abundances for 800 species in a 50 ha (500x1000 m) tropical rainforest of Malaysia in which the forest plot is divided into a grid system. Each grid cell is considered as a box and the status of the occupancy of each cell is known. |
|
| DATE/PLACE: | WEDNESDAY, February 23, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Atmospheric Data, Nonparametric Methods and Bayesian Interpretations |
| SPEAKER: | Claudia Tebaldi Geophysical Statistics Project Climate and Global Dynamics Division National Center for Atmospheric Research Boulder, CO 80302-3000 |
| The motivation for this work is from two seemingly unrelated problems in atmospheric science: forecasting clear air turbuluence and quantifying the dynamical properties of a climate model.
The common denominator of these two very different applications is the use of nonparametric tools -- MARS (Multivariate Adaptive Regression Splines) and Neural Networks -- to deal with high dimensional data. The essence of each of these methods is data-driven model selection through cross-validation and a useful way to understand such methods is in a Bayesian context. The Bayesian analogue, evaluates a posterior probability distribution over the space of models based on the data AND a prior probability distribution on that space. The fundamental question is "what prior is implicitly assumed by model selection criteria like GCV (Generalized Cross-Validation) and by adaptive tools like MARS and NNs?" This question can be anwered in two ways. One is by following the mainstream of the Bayesian model selection industry. The other adopts a 'reverse engineering' perspective in reconstructing the prior assumptions via simulation with respect to the MARS and NN estimators. |
|
| DATE/PLACE: | Tuesday, February 15, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Nonlinear Mixed-Effect Models with Missing Time-Dependent Covariates, with Application to HIV Viral Dynamics |
| SPEAKER: | Lang Wu Department of Biostatistics Harvard School of Public Health |
| The study of HIV viral dynamics is an important area in AIDS research. Nonlinear mixed-effect models have been proposed for modeling viral dynamic processes. A challenging problem in the modeling is to identify repeatedly measured, but possibly missing, immunologic or virologic markers (covariates) for viral dynamic parameters. For missing time-dependent covariates in nonlinear mixed-effect models, the commonly used methods may give misleading results. We propose a multiple imputation method which imputes the missing data at the individual level but can pool information across individuals. In situations where the covariate trajectories exhibit distinctive and important patterns, we also propose an alternative imputation method based on modeling the covariate processes. We compare various methods by Monte Carlo simulations, and find that the proposed multiple imputation (MI) method performs the best in terms of biasness and mean-squared errors in the estimates of covariate coefficients. A real dataset is analyzed based on the proposed methods. | |
| DATE/PLACE: | Thursday, February 10, 15:30 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Maximum Likelihood Estimation for Seed and Pollen Dispersal Parameters |
| SPEAKER: | Beatrix Jones Statistics, University of Washington |
|
The direction and distance that seeds and pollen travel are important demographic parameters for plant populations. Estimates of the distributions characterizing seed and pollen movement are important in understanding the fine scale genetic structure of a population and the ecological factors that are relevant to its persistance. In addition to conservation of populations, understanding these factors can aid in gauging the impact of the release of genetically engineered or non-native organisms. I present a method for inferring the maximum likelihood estimates of parametric distributions for seed and pollen dispersal distances using genetic data and spatial information at two consecutive generations. Likelihoods are obtained by using Monte Carlo methods to approximate a sum over a large number of discrete latent variables. |
|
| DATE/PLACE: | Thursday, February 3, 15:30 CHANGED FROM THURSDAY JANUARY 27 Department of Mathematics and Statistics Simon Fraser University Room K9509 NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | A Linkage Disequilibrium (LD) Study in the Newfoundland Population Reduces the Bardet-Biedl Syndrome (I) (BBS1) Interval to 1 cM |
| SPEAKERS: | Terry-Lynn Young, Medicine, Memorial University & William S. Davidson, Molecular Biology and Biochemistry and Dean of Science, SFU |
|
The direction and distance that seeds and pollen travel are important demographic parameters for plant populations. Estimates of the distributions characterizing seed and pollen movement are important in understanding the fine scale genetic structure of a population and the ecological factors that are relevant to its persistance. In addition to conservation of populations, understanding these factors can aid in gauging the impact of the release of genetically engineered or non-native organisms. I present a method for inferring the maximum likelihood estimates of parametric distributions for seed and pollen dispersal distances using genetic data and spatial information at two consecutive generations. Likelihoods are obtained by using Monte Carlo methods to approximate a sum over a large number of discrete latent variables. |
|
| DATE/PLACE: | Tuesday, February 1, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Statistical Methods for Assessing Habitat Preferences |
| SPEAKER: | Dieter Ayers Computing & Telecom Services University of Northern B.C. |
| It is often the case that samples are taken in a non-random fashion. This thesis attempts to define a methodology by which some analysis can be performed on a specific kind of non-random sample. In studies of wildlife behaviour, a common method of sampling involves tagging an animal and relocating it in subsequent time periods. Considering the number or type of animal present at sampling locations as a random sample is erroneous, as the locations were chosen by the animal and not in a random fashion. Further, with data such as this is not possible to draw conclusions about areas where no animals were observed, as it is unknown whether these areas were truly free of animals.
We treat the observed data as conditional on the presence of an animal, and then use a Bayesian approach to estimate the probability of finding animals in any given location. This results in a method that allows for mapping the propensity of a certain area to be chosen by an animal in the future. |
|
| DATE/PLACE: | Tuesday, January 18, 2000, 16:00 Leonard S. Klinck 301 (formerly Computer Science 301), 6356 Agricultural Road |
| TITLE: | Over-dispersion in Count Data |
| SPEAKER: | A. H. EL-Shaarawi National Water Research Institute Burlington, Ontario, Canada L7R 4A6 |
| Analysis of data in the environmental sciences often leads to a phenomenon referred to as over-dispersion. This occurrence is particularly common in count data, resulting in larger residual variations than the expected model variation. Approaches for detecting and estimating over-dispersion will be presented and applied to a number of environmental data sets. | |
| DATE/PLACE: | Thursday, January 13, 15:30 BC Cancer Research Centre's Lecture Theatre 601 W.10th Ave. NOTE TIME AND PLACE |
| TYPE: | Genetics and the Life Sciences |
| TITLE: | Bioinformatics at the Genome Sequence Centre and Beyond |
| SPEAKER: | Steven Jones Genome Sequence Centre, B.C. Cancer Agency |
| The term bioinformatics has arisen to encompass the growing field of computational biology. There has been a recent explosion in the ability to determine new DNA and protein sequences. However, the rate at which experimental biology is able to determine function functions for these sequences has remained expensive and time consuming. This has meant that computational prediction is now the only tool used in deriving meaning from the majority of biological sequences. The genes of complex organisms, such as humans can differ in their expression both temporally and spatially and will be under the influence of many cellular factors. In addition, the process of identifying the existence of genes and their correct gene structure remains problematic, let alone the putative function of any resulting protein. I will describe some of the problems which bioinformatics is trying to address in the analysis of DNA sequence as well as some of the other bioinformatic problems being addressed at the Genome Sequence Centre. | |
