Department Seminars 2004


Functional Data Analysis 2004 Website Archive
DATE/PLACE: Monday, December 13, 2004, 11:30am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Prediction Using Functional Regression Analysis
SPEAKER: Dr. Randy Eubank,
Department of Statistics,
Texas A&M University
.
Abstract: Methodology is developed for predicting the trajectory of sample paths from a stochastic process using functional regression techniques. The prediction algorithm involves three steps: estimation of the underlying process mean function, estimation of variance components for a random effects model relating the regression function to the sample paths and a prediction phase accomplished via linear regression with shrinkage toward the estimated mean function. The motivating problem is one of sales prediction for the Texas Lotto game which serves as a focal point for the discussion.
DATE/PLACE: Saturday, December 11, 2004, 9:00am
Segal Centre Conference Rooms,
SFU Harbour Centre,
515 West Hastings Street
Vancouver, BC V6B 5K3
TYPE: Workshop joint with Simon Fraser University
TITLE: Day 2: Functional Data Analysis Instructional Workshop
SPEAKER: Dr. J.O. Ramsay,
Psychology Department,
McGill University
.

Dec 10, 11
9 am - 5 pm

Get presentation slides.

The workshop is free, but please register in advance with nancy@stat.ubc.ca

The workshop is designed to provide something of value to as wide a range of participants as possible, ranging from those interested in whether FDA might prove useful in their research, to statistical methodologists looking for research problems and interested in new techniques.

Each lecture will begin with one or more case studies, and the initial lectures will be almost entirely case studies. These aim to show the range of applications possible, show what insights might be gained from using FDA methods, and illustrate the challenges that are specific or particularly relevant to the analysis functional data. Case studies are not "how to" sessions, but rather address questions like, "Why should I consider this approach?" and "What should I watch out for?"

The first half of the first day will also be more oriented to the preliminaries of functional data analysis:

  • What are functional data?
  • How should they be prepared for analysis?
  • How do we convert discrete noisy data to smooth functions?
  • What data exploration tools are useful?
  • Do the data display both phase and amplitude variation?
  • What about principal components analysis and other exploratory methods?

The remainder of the first day and some of the second day will consider linear models for functional data. This is a vast topic, and includes relatively basic topics like functional versions of analysis of variance and regression analysis, as well as issues less familiar to statisticians such as how differential equations can be used to model functional data. All approaches assume that the goal is to explain variation in one or more response variables by variation in one or more input or independent variables where, naturally, at least one of the variables involved is functional.

For more information on functional data analysis, see the FDA website.

DATE/PLACE: Friday, December 10, 2004, 9:00am
Segal Centre Conference Rooms,
SFU Harbour Centre,
515 West Hastings Street
Vancouver, BC V6B 5K3
TYPE: Workshop joint with Simon Fraser University
TITLE: Day 1: Functional Data Analysis Instructional Workshop
SPEAKER: Dr. J.O. Ramsay,
Psychology Department,
McGill University
.

Dec 10, 11
9 am - 5 pm

Get presentation slides.

The workshop is free, but please register in advance with nancy@stat.ubc.ca

The workshop is designed to provide something of value to as wide a range of participants as possible, ranging from those interested in whether FDA might prove useful in their research, to statistical methodologists looking for research problems and interested in new techniques.

Each lecture will begin with one or more case studies, and the initial lectures will be almost entirely case studies. These aim to show the range of applications possible, show what insights might be gained from using FDA methods, and illustrate the challenges that are specific or particularly relevant to the analysis functional data. Case studies are not "how to" sessions, but rather address questions like, "Why should I consider this approach?" and "What should I watch out for?"

The first half of the first day will also be more oriented to the preliminaries of functional data analysis:

  • What are functional data?
  • How should they be prepared for analysis?
  • How do we convert discrete noisy data to smooth functions?
  • What data exploration tools are useful?
  • Do the data display both phase and amplitude variation?
  • What about principal components analysis and other exploratory methods?

The remainder of the first day and some of the second day will consider linear models for functional data. This is a vast topic, and includes relatively basic topics like functional versions of analysis of variance and regression analysis, as well as issues less familiar to statisticians such as how differential equations can be used to model functional data. All approaches assume that the goal is to explain variation in one or more response variables by variation in one or more input or independent variables where, naturally, at least one of the variables involved is functional.

For more information on functional data analysis, see the FDA website.

DATE/PLACE: Thursday, December 09, 2004, 4:00pm
Segal Centre Conference Rooms,
SFU Harbour Centre,
515 West Hastings Street
Vancouver, BC V6B 5K3
TYPE: Statistics Seminar / BRG Seminar joint with Simon Fraser University
TITLE: From Data to Dynamic Models
SPEAKER: Dr. Jim Ramsay,
Psychology Department,
McGill University
.

Differential equations (DIFE's) can represent the underlying processes giving rise to observed functional data, and as such can offer a number of potential advantages over parametric or nonparametric basis expansion models.

  • DIFE's explicitly model the behavior of derivatives, and link this behavior to the observed function itself. Consequently, they model the rate of change in the data as well as their amplitudes.
  • Solutions to a linear DIFE of order m span an m-dimensional space, and consequently have the capacity to model curve-to-curve variation as well as to fit the data.
  • We can build known structural features into DIFE models more easily than is usually the case for conventional functional models.
  • Derivative estimates based on DIFE's are usually superior to those derived from conventional data smoothers.
  • And finally a DIFE offers a wider range of ways to introduce stochastic behavior into models.

In spite of the enormous importance of DIFE models in many areas of science and engineering, existing methods for actually identifying or estimating a differential equation from noisy data remain crude, inefficient, and unable to deliver estimates of sampling error.

I will discuss a technique for going directly from the discrete and noisy data to a DIFE that is based on the work of Heckman and Ramsay (2000). Some illustrations of its performance for simulated data will be offered as well as examples from chemical engineering and for medical data on treatment regimes for lupus.

Heckman, N. and Ramsay, J. O. (2000) Penalized regression with model-based penalties. The Canadian Journal of Statistics, 28, 241-258.

DATE/PLACE: Tuesday, December 07, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Individual and population penalized regression splines for accelerated longitudinal designs
SPEAKER: Jarek Harezlak,
Harvard School of Public Health
.

Accelerated longitudinal design (ALD) sampling schemes consist of a few observations per sampling unit over a short time span. ALD data are combined across independent units to provide an estimate of overall population curve and predictions of individual patterns of change. Extending the work of Ruppert, Wand and Carroll (2003), we develop computationally efficient procedure for the longitudinal penalized regression splines (P-spline) methods under ALD sampling schemes. Major advantage of the P-spline methodology is that the models can be fit using standard mixed models software (e.g. PROC MIXED in SAS).

Extensive simulation studies indicate good performance of our method in the settings considered. We compare balanced and complete longitudinal designs to ALDs using the Berkeley Growth study data and we apply our method to the longitudinal brain volume measurements from an ongoing pediatric magnetic resonance imaging (MRI) developmental study.

This talk is based on joint work with Louise Ryan, Nicholas Lange and Jay Giedd.

DATE/PLACE: Tuesday, November 30, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: An Adaptive Radial Basis Function Network Model for Statistical Detection
SPEAKER: Mu Zhu,
Department of Statistics and Actuarial Science,
University of Waterloo
.
We construct a special radial basis function (RBF) network model to detect items belonging to a rare class from a large database. Our primary example is a real drug discovery application. Our method can be viewed as modeling only the rare class but allowing for local adjustments depending on the density of the background class in local neighborhoods. We offer a statistical explanation of why such an approach is appropriate and efficient for the detection problem. Our statistical explanation together with our empirical success with this model have implications for a new paradigm for solving these detection problems in general. This work is joint with Wanhua Su and Hugh Chipman.
DATE/PLACE: Tuesday, November 23, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Extracting XML data from HTML repositories
SPEAKER: Ruth Zhang,
Department of Statistics,
UBC
.
I will present our system which extracts desired information (records) from thousands of HTML documents, starting from a small set of examples. Duplicates in the result are automatically detected and eliminated. The result is automatically converted to XML. We propose a novel method to estimate the current coverage of results by the system, based on capture-recapture models with unequal capture probabilities. We also propose techniques for estimating the error rate of the extracted information and an interactive technique for enhancing information quality. To evaluate the method and ideas, an extensive set of experiments has been conducted. The experimental results validate the effectiveness and utility of our system.
DATE/PLACE: Tuesday, November 16, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Business and Administrative Applications of Statistics
SPEAKER: Dr. Martin Puterman,
Centre for Operations Excellence, Sauder School of Business,
UBC
.
In this talk I will provide an overview of statistical issues we have encountered in projects carried out in the Centre for Operations Excellence (COE). In almost all projects, the key challenge has been to determine what data is required and then how to obtain it. We have also found that much can be gained through simple data displays but more advanced modeling has also provided useful insights. We will illustrate these points with examples from selected projects in risk analysis, logistics, credit scoring and other areas.
DATE/PLACE: Tuesday, November 02, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Partition Distributions: Applications to Linear Mixed Models and Clustering
SPEAKER: Dr. Glen Takahara,
Department of Mathematics & Statistics,
Queen's University
.
Partition functions arise in statistics in the context of mixture models. I will review how they arise in the semiparametric Bayesian setting and their role in Weighted Chinese Restaurant (WCR) algorithms for i.i.d. Sequential Importance Sampling in such models, and consider modifications to these algorithms tailored to linear mixed effects models. The goal here is to deal with non-normal random effects with a general approach with an eye to automation, efficiency and numerical stability. Partition functions also arise in non-Bayesian mixture models, which are becoming more popular for generative, or model-based, clustering. I will consider a distance derived from the partition function for model-based heirarchical clustering and feature selection.
DATE/PLACE: Tuesday, October 19, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Model-based Geostatistics: Modelling, Computation and Asymptotics
SPEAKER: Dr. Hao Zhang,
Department of Statistics,
Washington State University
.
Explicit stochastic processes have been increasingly used to model geostatistical data and likelihood based inferential methods are consequently employed in the data analysis. This model-based approach can effectively model and analyze spatial counts through spatial generalized linear mixed models for example. I first review the spatial generalized linear mixed models and provide some analytic results that can be used together with Markov chain Monte Carlo methods to reduce the amount of calculations for inferences. I then cover some asymptotic results under the infill asymptotic framework that can be used to explain finite sample properties. Finally, I address the practical question: Given a finite sample, should one use the increasing domain asymptotic results or the infill asymptotic results?
DATE/PLACE: Thursday, October 14, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Bayesian sensitivity analysis for unmeasured confounding in observational studies
SPEAKER: Lawrence McCandless,
Department of Statistics,
UBC
.
Systematic error due to possible unmeasured confounding may weaken the validity of findings from observational studies investigating the effects of exposures on disease. Because study subjects are assigned to exposure levels in a non-random way, hidden differences between exposure groups may bias effect estimates in a way which is difficult to predict. A solution is to conduct a Bayesian sensitivity analysis (BSA) which incorporates uncertainty about unmeasured confounding into the analysis as prior distributions on bias parameters. Markov chain Monte Carlo techniques can then be used to summarize the posterior distribution of the exposure effect given the data and prior beliefs about unmeasured confounding. We consider BSA in the context of logistic regression models for a binary exposure, binary outcome, binary unmeasured confounder and covariate vector. Because the resulting model is not identifiable, standard theory governing the large sample behaviour of posterior distributions cannot be applied, complicating an evaluation of the performance of BSA. Using simulation studies, we demonstrate that if the prior distribution for the analysis of datasets from a sequence of observational studies approximates the distribution from which study parameters arise, then the coverage probabilities of BSA 95% credible intervals will be approximately 95% on average. Moreover, BSA credible intervals may yield greater average coverages probabilities of the true exposure effect compared to methods which ignore unmeasured confounding. As an example, we investigate the effect of possible unmeasured confounding on risk of elevated triglyceride levels among HIV infected persons treated with protease inhibitor.
DATE/PLACE: Tuesday, October 12, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Projection Properties of Non-Regular Fractional Factorial Designs
SPEAKER: Dr. Jason Loeppky,
Department of Statistics,
UBC
.
In many industrial applications screening experiments are performed at the initial stages of the experimental process to test the significance of a large number of main effects and some two factor interactions. Typically, the experimenter chooses a design with a relatively small number of runs that will allow for the estimation of a large number of main effects and some two factor interactions, assuming that only a few of the main effects are active. The difficulty with most experimental situations can be viewed as two-fold: often the experimenter has no prior knowledge of which effects are important, thus it is desirable to select a design that allows for joint estimation of all main effects and the associated two factor interactions, and cost usually limits the number of experimental trials that can be performed. In this talk we introduce the projection estimation capacity sequence and use this to select good designs. We focus attention on the selection of non-regular fraction factorial designs, and results are presented for designs with 20, 24 and 28 runs.
DATE/PLACE: Tuesday, October 05, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Super-Brownian Motion and Critical Spatial Stochastic Systems
SPEAKER: Dr. Edwin Perkins,
Department of Mathematics,
UBC
.
Brownian motion arises as a universal limiting object for small centered fluctuations. In the same way super-Brownian motion arises as a universal limiting object for spatial stochastic systems near criticality. The list of stochastic systems which approach super-Brownian motion under rescaling includes distributions of genotypes undergoing random genetic drift and mutation, stochastic models for epidemic spread (contact processes), competing species models in mathematical ecology (including the voter model), and percolation models at criticality and in sufficiently high dimensions. Some of these connections will be presented and discussed.
DATE/PLACE: Tuesday, September 28, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Conditional Distribution of Goodness-of-Fit Tests
SPEAKER: Dr. Federico O'Reilly,
Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas UNAM, Mexico
.
The idea is to advocate the use of the conditional distribution of the goodness-of-fit test given the value of Tn, the minimal sufficient statistic. This, in the problem of testing fit of a distribution in presence of unknown parameters. Since the parameters themselves are not of interest, they are considered nuisance and so conditioning seems to be appropriate. Some comments are made regarding this procedure and emphasis is placed on the fact that with this approach there is no need for sets of tables but rather for just an algorithm based on simulation which produces the exactconditional p-value. So it is claimed to be an exact level , finite-n procedure, in the continuous case. It may be used in the discrete case but level would be approximate because of discreteness of Tn. The inverse Gaussian is discussed, comparing the results of the advocated procedure with recent work, showing that for the alternatives studied, there is an increase of power.
DATE/PLACE: Thursday, September 16, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Post-Data Pivotal Inference in the Balanced One-Way Random Effects Model
SPEAKER: Dr. John F. Brewster,
Department of Statistics,
University of Manitoba
.
In problems involving ordered or truncated parameters, classical inference procedures fail to take account of the parameter constraints and thus have poor post-data coherence properties. In random effects models, in particular, inference procedures based on the usual pivotal distributions fail to take account of orderings on the expected mean squares. This can result in negative estimates of variance components, for example. In such problems it is proposed that the distributions of the usual pivotals be replaced by "conditional versions". This results in what is called post-data pivotal (PDP) inference. In this talk the PDP methodology will be illustrated through the balanced one-way random effects model. Here the PDP procedures are shown to have good pre-data (decision-theoretic) properties, as well as good post-data interpretations. In particular, point and interval estimation procedures based on the PDP approach "dominate" the corresponding unconditional procedures and are faithful to the parameter constraints imposed by the
model. They also have a Bayesian interpretation. (This is joint work with Dennis J. Murphy.)
DATE/PLACE: Tuesday, August 17, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Quantile Regression Methods For Reference Growth Charts
SPEAKER: Ying Wei,
Department of Statistics,
Columbia University
.
The reference growth charts were widely used to screen the measurements from an individual subject in the context of population values. Estimation of reference growth curves has traditionally relied on the assumption of normality and were generated from cross-sectional data. More flexible methods based on nonparametric or semiparametric quantile regression are shown to compare favorably with earlier methods, particularly for longitudinal growth models that incorporate prior growth history, and other covariates. The new methods are illustrated with data used for the modern Finnish reference charts for height.
DATE/PLACE: Tuesday, July 27, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Principal Directions of a Continuous Random Variable
SPEAKER: Dr. Daniel Cuadras,
University of Barcelona
.

The Principal Components Decomposition of a discrete data matrix is a widely used technique of Multivariate Analysis. Given some multivariate data from a population, its objective is to find some new variables that explain the most important causes of variability, while being uncorrelated between them. These new variables, called Principal Components, help to analyze the data more easily, by reducing the dimensionality, while at the same time they don't interfere with each other.

We can interpret a discrete data matrix as a group of discrete variables. We are interested in knowing if it would be possible to do the same with any continuous random variable, obtaining the equivalent to the Principal Components. The answer is affirmative, and we can find a series of random variables, called Principal Directions, with analogous properties to those of the discrete case. Actually, we can obtain them by generalizing all the mathematical operations from the discrete case, to their continuous equivalent in the general case. Thanks to this decomposition of this random variable, we find a more convenient way to represent it. It has some applications, like formulating a continuous extension of multidimensional scaling, obtaining a graphical test to distinguish between similar distributions, improving some tests of independence by relating principal components, contributing to the study of the asymptotic distribution of some statistics related to Rao's ANOQE (a generalization of ANOVA), studying some tests of goodness of fit, etc.

DATE/PLACE: Tuesday, July 13, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Separation Index and Low-Dimensional Visualization in Cluster Analysis
SPEAKER: Weiliang Qui,
Doctoral Candidate, Department of Statistics,
UBC
.

Cluster analysis (unsupervised learning) is a challenging problem. Its goal is to detect the separated subsets or clusters in high-dimensional data sets. Many clustering methods have been proposed. If we apply different clustering methods to the same data set, the partitions usually are different. Cluster validation methods are needed to compare the different partitions for the same data set and to check the appropriateness of a given partition for a data set, without the knowledge of the known cluster structure. We consider these based on the degree of separation among clusters.

In this talk, we propose a separation index to measure the degree of separation (or magnitude of gap) between pair of clusters. The separation index is intuitively appealing and easy to compute. The separation indexes can be used to compare partitions from different clustering methods. Based on the separation index, we propose a method to visualize the degree of separation between pair of clusters in a low-dimensional space. We also propose a sequential clustering method based on the separation index to simultaneously estimate the number of clusters and obtain partitions.

DATE/PLACE: Wednesday, June 16, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Multiple hypothesis testing and clustering: statistical methods for analysis of high dimensional biological data
SPEAKER: Dr. Katherine Pollard,
Postdoctoral Researcher, Center for Biomolecular Science & Engineering,
University of California, Santa Cruz
.
Discovering meaningful patterns in the wealth of data produced by gene expression experiments and genome sequencing projects requires rigorous statistical methods. Multiple testing problems arise whenever one wishes to perform statistical tests for each of many genes or genomic regions. Identifying differently expressed genes from microarray experiments is a typical example. We have derived a general characterization of the null distribution for multiple testing that asymptotically controls type I error rates without conditions such as subset pivotality. This characterization is novel, because it utilizes the distribution of the test statistics rather than a data null distribution. A simple bootstrap estimator of this distribution is presented. I describe general single-step and step-down multiple testing procedures, as well as augmentation procedures designed to improve power. With a statistically significant subset in hand, clustering methods assist in the identification of patterns in the data. We have developed a hybrid clustering algorithm called HOPACH, which combines the strengths of both partitioning and agglomerative hierarchical clustering methods. Using this algorithm as an example, I demonstrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. Applications to microarray data and comparative genomics illustrate the methodologies.
DATE/PLACE: Friday, June 04, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Unsupervised Feature Selection in Mixture Modelling - A Bayesian Approach
SPEAKER: Natalie Thompson,
MSc Student, Department of Statistics,
UBC
.
Mixture modelling is ideal for depicting the heterogeneity in cluster analysis. Though many authors have created or improved upon algorithms for fitting mixture models, few include feature selection in the unsupervised algorithms. The performance of the unsupervised learning algorithm presented is greatly improved by the removal of noisy and unnecessary features. This improvement is illustrated by using the algorithm to cluster two synthetic examples and one Corel dataset of annotated images.
DATE/PLACE: Thursday, June 24, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: BRG Seminar
TITLE: Composite Likelihood Estimating Procedures in Familial Data Analysis
SPEAKER: Yinshan Zhao,
Department of Statistics,
UBC
.
Two classes of models can be used to analyze various types of familial data: the multinormal random effects models and the multinormal copula models. However, these models are hindered in practice by their computational difficulties. We propose two estimation procedures based on composite likelihoods of univariate and bivariate margins. The first method is a two-stage method in which the univariate parameters are estimated based on the sum of log likelihoods of univariate margins and the dependence parameters are estimated separately based on the sum of log likelihoods of bivariate margins with the univariate marginal parameters replaced by their estimates. In the second method, all the parameters are estimated from the weighted sum of log likelihoods of bivariate margins. These two composite likelihood methods can greatly reduce computation in parameter estimation, but with a price of efficiency loss. For some special cases, we compared the asymptotic efficiency of these two methods with the maximum likelihood method. We found that the performance of the two methods is reasonable, except that when the dependence is strong, the first approach is inefficient for the regression parameters. We also find that the second approach is generally better for the regression parameters, but less efficient for the dependence parameters when the dependence is weak.
DATE/PLACE: Thursday, May 20, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: BRG Seminar
TITLE: Simultaneous Inference for Generalized Linear Mixed Models with Informative Dropout and Missing Covariates
SPEAKER: Kunling Wu,
Department of Statistics,
UBC
.
Generalized linear mixed effects models (GLMMs) are popular in many longitudinal studies. In these studies, however, missing data problems arise frequently, which makes statistical analyses more complicated. In this thesis, we propose an exact method and an approximate method for GLMMs with informative dropouts and missing covariates, and provide a unified approach for simultaneous inference. Both methods are implemented by Monte Carlo EM algorithms. The approximate method is based on Taylor series expansion, and it avoids sampling the random effects in the E-step. Thus, the approximate method may be computationally more efficient when the dimension of random effects is not small. We also briefly discuss other methods for accelerating the EM algorithms. To illustrate the proposed methods, we analyze two real datasets, a AIDS 315 dataset and a dataset from a parent bereavement project, using these methods. A simulation study is conducted to evaluate the performance of the proposed methods under various situations.

DATE/PLACE: Thursday, April 29, 2004, 11:00am
Room 311
AMPEL Building, UBC
TYPE: Colloquium in Computer Science
TITLE: Sequential Monte Carlo Samplers
SPEAKER: Dr. Arnaud Doucet,
Department of Engineering,
Cambridge University, UK
.

In this paper, we propose a general methodology to sample sequentially from a series of probability distributions known up to a normalizing constant and defined on a common space; i.e. in a context where one usually uses Markov chain Monte Carlo (MCMC). These probability distributions are approximated by a cloud of weighted random samples which are propagated over time using Sequential Monte Carlo methods. This methodology allows us to derive not only simple algorithms to make parallel Markov chain Monte Carlo runs interact in a principled way but also to obtain new methods for global optimization and sequential Bayesian estimation. We demonstrate the performance of these algorithms through simulation for various integration and global optimization tasks arising in the context of Bayesian inference. Precise convergence results have been established and will also be presented.

In the CS talk (to be given Thursday, 29 April) I will emphasize the applications whereas I will emphasize the methodology in the stats talk.

DATE/PLACE: Monday, April 19, 2004, 2:30pm
Angus 425,
2053 Main Mall, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Using the Gene Ontology
SPEAKER: Dr. Robert Gentleman,
Department of Biostatistics,
Harvard School of Public Health
.
The analysis of gene expression data is one of the standard problems in bioinformatics. However, many approaches do not make use of relevant and available biological data (or meta-data). In this talk I will demonstrate some of the potential uses of data provided by the Gene Ontololgy (as well as other meta-data resources) to help guide the analysis as well as to provide other avenues for understanding the observed data. Aspects of visualization and statistical testing will be considered.
DATE/PLACE: Monday, April 19, 2004, 2:30pm
Angus 425,
2053 Main Mall, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Using the Gene Ontology
SPEAKER: Dr. Robert Gentleman,
Department of Biostatistics,
Harvard School of Public Health
.
The analysis of gene expression data is one of the standard problems in bioinformatics. However, many approaches do not make use of relevant and available biological data (or meta-data). In this talk I will demonstrate some of the potential uses of data provided by the Gene Ontololgy (as well as other meta-data resources) to help guide the analysis as well as to provide other avenues for understanding the observed data. Aspects of visualization and statistical testing will be considered.
DATE/PLACE: Friday, April 16, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: BRG Seminar
TITLE: Regression Models for Skewed Health Care Costs
SPEAKER: Dr. Xiao-Hua Andrew Zhou,
Deptartment of Biostatistics,
University of Washington
.

There has been increasing emphasis on cost comparison and cost-effectiveness analysis in health care delivery systems. However, the correct analysis of health care costs may be impeded by the following characteristics of health care costs: (1) a high proportion of patients with zero costs, (2) a highly skewed distribution for non-zero costs, and (3) heteroscedasticity (non-constant variance). Without taking these characteristics into consideration, statistical analyses of health care cost costs can lead to unreliable inferences and predictions. In this talk we first briefly review some statistical methods for one sample, two sample, and ANOVA type problems, which can adjust for these characteristics. We then introduce a new semi-parametric regression model for cost data with some zero values. The proposed approach is based on a monotone transformation of cost observations and the analysis of the data on the transformed scale. The main complication in such the approach is the retransformation bias, which arises when one transforms back results to the original scale for prediction and forecasting. To allow for the fact that transformation of the nonzero responses may not achieve normality and homoscedasticity, we fit a semi-parametric heteroscedastic regression model to transformed nonzero responses and then propose two non-parametric estimators for the mean response on the original scale.

DATE/PLACE: Thursday, April 15, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Recent Works in Statistics and Substantive Areas
SPEAKER: Dr. Ying MacNab,
Department of Health Care and Epidemiology,
UBC
.
My research interests are primarily in the areas of Bayesian hierarchical models and related methods of inference, with particular attention to analyses of overdispersed and corrected data arising from studies of spatial, nested, and multilevel designs. My works to date have tackled a range of statistics issues and topics including Bayes and empirical Bayes disease mapping; hierarchical/multilevel models; spatio-temporal models; spline smoothing; mapping non-rare events/disease; Bayesian ecological models; prior selection, identifiability, and Bayesian learning issues in Bayesian hierarchical and spatial modeling; Bayesian inference via Markov Chain Monte Carlo (MCMC); Bayesian (FB), and empirical Bayesian (EB), and bootstrap methodologies for hierarchical model inference. My research interests also extend to various substantive research fields including spatial epidemiology, disease surveillance, health services and outcomes research, population health, health policy, and environmental health. In this presentation, I will give a brief review of my recent works and projects.
DATE/PLACE: Tuesday, April 06, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Estimation of a scale parameter under contamination: Unidentifiability and bias-robustness
SPEAKER: Dr. John Collins,
Department of Mathematics and Statistics,
University of Calgary
.

We consider the problem of finding a bias-robust estimator of a scale parameter of a symmetric unimodal parametric distribution F when a proportion of the observations arise from an arbitrary contaminating distribution. The class of scale functionals considered are those which are location invariant, scale equivariant, Fisher consistent at F, and weakly continuous. Lower bounds for the (suitably defined) maximal asymptotic biases are derived as well as lower bounds on gross errors sensitivities (the limiting case as the proportion of contamination approaches 0). Computational verifications show that these lower bounds are surprisingly tight in some cases.

A key step in developing the minimax bias theory is to define an appropriate bias functional which is not only scale equivariant but also explicitly takes into account the unidentifiability of the scale parameter induced by the contamination model. For the case of the "p -shorth" (the shortest interval containing at least 100p% of the data), we show that under some general conditions on F, satisfied by the Cauchy distribution as well as all strongly unimodel distributions, the maximum asymptotic bias under our definition coincides with that under the usual definition used in the robustness literature. For the special case of F being the Cauchy distribution, we show that the 0.5- shorth (normalized to be consistent at F) not only attains the lower bound on gross errors sensitivity, but is the scale estimator which has minimax asymptotic bias under contamination-- and this is true for all proportions of contamination between 0 and 1/2.

DATE/PLACE: Thursday, April 01, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Dragging Survey Sampling into the 21st century
SPEAKER: Dr. Matthias Schonlau,
The RAND Corporation, Santa Monica, CA
.

My talk will contain two parts: 1) a simulation based approach to survey design with an application to the health sciences and 2) an outline of some problems related to the analysis of web surveys.

In order to better inform study design decisions when sampling patients within and across health care providers we develop a simulation-based approach for designing complex multi-stage samples. The approach explores the tradeoff between competing design goals such as precision of estimates, coverage of the target population and cost. We elicit a number of sensible candidate designs, evaluate these designs with respect to multiple sampling goals, investigate their tradeoffs, and identify the design that is the best compromise among all goals. This approach recognizes that, in the practice of sampling, precision of the estimates is not the only important goal, and that there are tradeoffs with coverage and cost that should be explicitly considered. We construct a sample frame with all phase III clinical cancer treatment trials that are conducted by cooperative oncology groups of the National Cancer Institute from October 1, 1998 through December 31, 1999. Simulation results for our study suggest sampling a different number of trials and institutions than initially considered.

Web surveys have substantially gained in popularity in recent years. Because respondents are usually allowed to self-select into web surveys, however, most web surveys completely ignore the tried and true statistical concept of a random sample. As of today, a "right" way to adjust for the selection bias resulting from self-selection has not yet emerged from the literature. Researchers in biostatistics have long used Rosenbaum and Rubin's propensity score methodology for causal inference from observational data. I will outline challenges in applying this approach to survey sampling.

DATE/PLACE: Thursday, March 25, 2004, 4:00pm
K9509, Applied Sciences Building
Simon Fraser University
TYPE: BRG Seminar
TITLE: Multiple Hypothesis Testing and Clustering: Statistical Methods for Analysis of High Dimensional Biological Data
SPEAKER: Dr. Katherine S. Pollard,
Center for Biomolecular Science & Engineering,
University of California, Santa Cruz
.
Discovering meaningful patterns in the wealth of data produced by gene expression experiments and genome sequencing projects requires rigorous statistical methods. Multiple testing problems arise whenever one wishes to perform statistical tests for each of many genes or genomic regions. Identifying differently expressed genes from microarray experiments is a typical example. We have derived a general characterization of the null distribution for multiple testing that asymptotically controls type I error rates without conditions such as subset pivotality. This characterization is novel, because it utilizes the distribution of the test statistics rather than a data null distribution. A simple bootstrap estimator of this distribution is presented. I describe general single-step and step-down multiple testing procedures, as well as augmentation procedures designed to improve power. With a statistically significant subset in hand, clustering methods assist in the identification of patterns in the data. We have developed a hybrid clustering algorithm called HOPACH, which combines the strengths of both partitioning and agglomerative hierarchical clustering methods. Using this algorithm as an example, I demonstrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. Applications to microarray data and comparative genomics illustrate the methodologies.
DATE/PLACE: Wednesday, March 24, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Probabilistic graphical models for scene and object recognition
SPEAKER: Dr. Kevin P. Murphy,
CSAIL (Computer Science & Artificial Intelligence Laboratory),
MIT
.
Probabilistic graphical models are a way of combining multiple sources of noisy evidence together in a principled fashion, in order to come up with an optimal estimate of the hidden state of a system. Well-known examples include Kalman filters and HMMs. In this talk, I will show how we can use graphical models to perform fast and robust place and scene recognition. I will then show how to extend the model to detect objects such as cars, people, computers, etc. We use the output of the scene recognition system to decide which objects are likely to be present (for example, cars are unlikely in indoor scenes). Next we use global image features to predict the likely location of the object. Finally we apply a standard object detector (based on boosted decision stumps) to the image. The various sources of information are combined using a discriminatively trained graphical model (a conditional random field). We discuss some methods for efficiently training such models, and demonstrate our system on a challenging dataset of indoor and outdoor images collected with a wearable camera.
DATE/PLACE: Thursday, March 11, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: High breakdown point regression estimates for censored data
SPEAKER: Dr. Matías Salibián-Barrera,
School of Mathematics and Statistics,
Carleton University
.

In this talk I will discuss robust regression estimates for censored data. The extension of the Least Squares estimate to the case of censored responses was first proposed by Miller (1976) and later revisited by Buckley and James (1979). More recently Ritov (1990) and Lai and Ying (1994) studied M-estimates for censored responses. Unfortunately, these estimates require a monotone estimating equation and hence are only robust against low-leverage outliers. We propose an extension of high breakdown regression estimates to the case of censored responses. In particular, our approach extends the Least Median of Squares estimates [Rousseeuw, 1984], the S-estimates [Rousseeuw and Yohai, 1984], the MM-estimates [Yohai, 1987], and the tau-estimates [Yohai and Zamar, 1988]. I will illustrate our proposal with an example, discuss its properties, and present results of a Monte Carlo study that explored its finite-sample properties. If time permits I will also present an algorithm to compute these estimates.

(Joint work with Victor Yohai - University of Buenos Aires)

DATE/PLACE: Tuesday, March 09, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: On the Mean Squared Prediction Error for Small Area Estimation
SPEAKER: Dr. Tapabrata Maiti,
Iowa State University
.
Small area estimation draws enormous attention in recent years due to its wide range of applications, particularly in socio-economic and policy making. The small area estimator based on direct sample size is likely to have unacceptably large variance and there is a need of constructing model based estimator which will have low mean squared prediction error (MSPE). Estimation of MSPE and in particular the bias correction of MSPE plays the central role in small area estimation research. In this article, a new technique of bias correction of the estimated MSPE is proposed. It is shown that the new MSPE estimator attains the same level of bias correction accuracy as the existing estimators based on Taylor expansion and jakknife methods. Moreover, unlike the existing methods, the proposed estimate of MSPE is always nonnegative.
DATE/PLACE: Monday, March 01, 2004, 9:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Iterative Conditional Fitting for Gaussian Ancestral Graph Models
SPEAKER: Mathias Drton,
Department of Statistics,
University of Washington
.

Ancestral graph models, introduced by Richardson and Spirtes (2002), are a new class of graphical models that generalizes both Markov random fields (underlying undirected graph) and Bayesian networks (underlying DAG = directed acyclic graph). A key feature of ancestral graphs is that they can encode all conditional independence structures which may arise from a Bayesian network/DAG model with selection and unobserved variables.

In this talk, we consider Gaussian ancestral graph models and present a new algorithm for maximum likelihood estimation. We call this new algorithm iterative conditional fitting (ICF) since in each step of the procedure, a conditional distribution is estimated, subject to constraints, while a marginal distribution is held fixed. We show that in the considered Gaussian case, ICF may be implemented by regressions on "pseudo-variables". The ICF algorithm is in duality to the well-known iterative proportional fitting algorithm, in which a marginal distribution is fitted for a fixed conditional distribution. Finally, the ICF approach seems promising for future development of methodology in the case of discrete variables.

This is joint work with Thomas Richardson, Department of Statistics, University of Washington.

DATE/PLACE: Thursday, February 26, 2004, 4:00pm
AQ 5005, Academic Quadrangle
Simon Fraser University
TYPE: BRG Seminar
TITLE: Identification of Susceptibility Genes for Colon Neoplasia
SPEAKER: Denise Daley,
St. Paul's Hospital
.
In order to identify susceptibility genes that predispose individuals to colon neoplasia, the Cleveland Colon Neoplasia Sibling Study (CNSS) has enrolled over 6,000 individuals affected with either colon cancer or adenomatous polyps and has recently completed a genome-wide linkage scan on 241 of these families consisting of 935 individuals. To account for potential locus heterogeneity and more complex disease models, sub-groups were identified for analysis based on clinically relevant phenotypes. Results from the severe histopathology subgroup will be presented. The Severe histopathology subgroup consists of 53 families in which a sibpair is affected with invasive cancer, high grade dysplasia > 1cm or tubulovillous adenoma's >1 cm. Significant evidence for linkage to a new region on chromosome 9 has been identified for the severe histopathology group, and our results have recently been published in PNAS (PNAS 100:12961-12965 2003). Our analysis was performed using model free linkage methods, which do not require the specification of the disease model parameters. However, model free methods can be used to estimate disease model parameters and examples will be presented.
DATE/PLACE: Friday, February 20, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Model Diagnostics for Smoothing Spline ANOVA Models
SPEAKER: Dr. Chong Gu,
Department of Statistics,
Purdue University
.
Functional ANOVA decompositions can be incorporated in multivariate function estimation through the penalized likelihood method. In this talk, we present some simple diagnostics for the ``testing'' of selected model terms in the decomposition; the elimination of practically insignificant terms generally enhances the interpretability of the estimates, and sometimes may also have inferential implications. What we try to achieve are the tasks of the traditional likelihood ratio tests, but in the absence of sampling distributions due to the typically infinite dimensional nulls in nonparametric settings. The diagnostics are illustrated in the settings of logistic regression and density estimation.
DATE/PLACE: Thursday, February 19, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Parameter-Driven Models for Time Series of Count Data
SPEAKER: Dr. Rachel MacKay,
Postdoctoral Fellow, Department of Biostatistics,
University of Washington
.

Modelling autocorrelated count data is a challenging problem. Unlike the situation for continuous data, for which the multivariate normal distribution is available, for count data there is no convenient and flexible class of multivariate distributions that can capture the shape of the distribution and the autocorrelation. Parameter-driven (latent variable) models are one means of overcoming these difficulties. However, these models can be difficult to interpret and to estimate -- and to apply in the absence of information about the latent process.

In this talk, we formulate a general class of parameter-driven models for count data. Our formulation is conducive to relatively simple interpretation of the models, and to robust estimation of the regression coefficients. Furthermore, our class includes two popular models: the generalized linear mixed model (GLMM) and the hidden Markov model (HMM). We will describe, based on our specification of the class, the connection between these two models and their common properties.

We apply these ideas to the analysis of both multiple sclerosis and polio incidence data.

This work is joint with Brian Leroux of the University of Washington.


DATE/PLACE: Tuesday, February 24, 2004, 11:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: SHIPSL: A New Health Index for Puget Sound Lowland Streams
SPEAKER: Dr. Grace Chiu,
Postdoctoral Fellow, PIMS and Visiting Scholar,
University of Washington
.

Stream health is often measured by the multimetric benthic index of biotic integrity (B-IBI). For Puget Sound Lowland (PSL), the B-IBI comprises ten metrics which quantify the well-being of benthic inhabitants of the stream. Each metric is converted to a score of 1, 3, or 5, where a higher value indicates a healthier stream with respect to the metric. Summing the metric scores yields the B-IBI. Stream health is then rated as very poor, poor, fair, good, or excellent according to the index value.

Discretization is the conventional method to standardize B-IBI metric values measured on different scales. This scoring scheme requires subjective and space/time-dependent input on the cutoff points of metric scales. In contrast, simple statistical standardization (centering and division by the standard deviation) appears to be more natural, is non-study-specific, and maps the metric space onto a continuous scale centered at 0. Our stream health index for the Puget Sound Lowland (SHIPSL) is the sum of the ten metric values standardized as such.

Bootstrap simulation results show that SHIPSL is a more efficient measure of stream health. Without sacrificing information on biotic integrity, SHIPSL reduces bias and variability of the health index, and eliminates sensitivity of the rating to slight changes in the metric scoring scheme.

DATE/PLACE: Friday, February 20, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Model Diagnostics for Smoothing Spline ANOVA Models
SPEAKER: Dr. Chong Gu,
Department of Statistics,
Purdue University
.
Functional ANOVA decompositions can be incorporated in multivariate function estimation through the penalized likelihood method. In this talk, we present some simple diagnostics for the ``testing'' of selected model terms in the decomposition; the elimination of practically insignificant terms generally enhances the interpretability of the estimates, and sometimes may also have inferential implications. What we try to achieve are the tasks of the traditional likelihood ratio tests, but in the absence of sampling distributions due to the typically infinite dimensional nulls in nonparametric settings. The diagnostics are illustrated in the settings of logistic regression and density estimation.
DATE/PLACE: Thursday, February 19, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Parameter-Driven Models for Time Series of Count Data
SPEAKER: Dr. Rachel MacKay,
Postdoctoral Fellow, Department of Biostatistics,
University of Washington
.

Modelling autocorrelated count data is a challenging problem. Unlike the situation for continuous data, for which the multivariate normal distribution is available, for count data there is no convenient and flexible class of multivariate distributions that can capture the shape of the distribution and the autocorrelation. Parameter-driven (latent variable) models are one means of overcoming these difficulties. However, these models can be difficult to interpret and to estimate -- and to apply in the absence of information about the latent process.

In this talk, we formulate a general class of parameter-driven models for count data. Our formulation is conducive to relatively simple interpretation of the models, and to robust estimation of the regression coefficients. Furthermore, our class includes two popular models: the generalized linear mixed model (GLMM) and the hidden Markov model (HMM). We will describe, based on our specification of the class, the connection between these two models and their common properties.

We apply these ideas to the analysis of both multiple sclerosis and polio incidence data.

This work is joint with Brian Leroux of the University of Washington.

DATE/PLACE: Wednesday, February 18, 2004, 2:30pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: BRG Seminar
TITLE: More (Mis?)Adventures with Mixed Models
SPEAKER: Dr. Rollin Brant,
Department of Community Health Science,
University of Calgary
.
There has been rapid growth in the use of mixed models over the last decade. During that period statisticians have encountered a number of instances of apparent paradoxical behaviour in associated estimation methods. Most recently, we have learned of the importance of the assumption of independence between random cluster effects and cluster level covariates that is implicit in conventional marginal likelihood based approaches. In this presentation I will explore this issue in the context of comparing (or "validating") measurement methods and will demonstrate the utility of multivariate (i.e. multi-response) mixed models.
DATE/PLACE: Monday, February 16, 2004, 10:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Earthquakes, point processes, and prototypes
SPEAKER: Dr. Rick P. Schoenberg,
Department of Statistics,
University of California at Los Angeles
.
Earthquake occurrences are frequently characterized using point process models. An important example is the epidemic-type aftershock sequence (ETAS) model of Ogata (1988, 1998). Assessing goodness-of-fit for such models is typically quite difficult. This talk begins by exploring some model evaluation techniques for multi-dimensional point process models. One method involves rescaled residuals, obtained by sliding points along one coordinate to form a homogeneous Poisson process inside a random, irregular boundary. This and another method involving random thinning are applied to point process models for the space-time-magnitude distribution of earthquake occurrences, and suggest ways of improving upon the ETAS model. The second part of this talk addresses the question: what does a typical aftershock sequence look like? Prototype point patterns, defined as patterns minimizing the sum of squared distances from a collection of point process realizations, may be used to address this question. Other seismological applications of prototype point patterns are also explored.
DATE/PLACE: Monday, February 02, 2004, 9:00am
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar
TITLE: Conjugate Dirichlet Process Mixture Models: Gene Expression, Efficient Sampling, and Clustering
SPEAKER: David Dahl,
Department of Statistics / Department of Biostatistics and Medical Informatics,
University of Wisconsin-Madison
.
This talk proposes a novel conjugate Dirichlet process mixture (DPM) model for the analysis of gene expression data, introduces a new MCMC sampling algorithm for fitting general conjugate DPM models, and describes a quick mode-finding algorithm for clustering in a particular class of conjugate DPM models. Since biologists are typically interested in expression patterns over a variety of treatment conditions, the proposed model clusters genes having similar patterns of expression (instead of similar levels of expression) and naturally incorporates any number of treatment conditions. Further, hypotheses are easily tested and false discovery rates are readily estimated. The second part of the talk addresses formidable computational issues arising in the use of DPM models by introducing a new MCMC sampling algorithm for any (not just the gene expression model) conjugate DPM model. Simulations indicate that the proposed sampler can be significantly faster than existing methods. The new algorithm is a merge-split sampler which uses ideas similar to those in sequential importance sampling. Finally, in the case of two treatment conditions, a very quick clustering algorithm is introduced which is guaranteed to find the mode of the posterior clustering distribution in a class of conjugate DPM models. Pre-prints are available at http://www.stat.wisc.edu/~dbdahl. [
DATE/PLACE: Thursday, January 29, 2004, 4:00pm
K9509, Applied Sciences Building
Simon Fraser University
TYPE: BRG Seminar
TITLE: Interim Analyses in Clinical Trials
SPEAKER: Dr. Joan Hu,
Department of Statistics and Actuarial Science,
Simon Fraser University
.
Formal interim analyses of clinical trials based on prespecified termination criteria (stopping rules) are commonly used to monitor the evolving efficacy of treatment regimens. In some circumstances, however, relatively little may be known about the nature and magnitude of the expected treatment effects or the clinical significance of differences in the outcome measure. There may be no obvious metric for comparing treatment groups. This talk presents an approach motivated by these considerations. The approach is based on the use of repeated confidence bands (RCB) for the mean of response process. It will be described how RCB can be adapted to cope with situations where the domain of concern varies as the study proceeds or where a test of hypotheses is of primary interest. Examples involving recent HIV/AIDS clinical trials will be used for illustration.
DATE/PLACE: Thursday, January 15, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Distance Weighted Discrimination and Geometrical Representation of High Dimension - Low Sample Size data
SPEAKER: Dr. J.S. Marron,
Statistics Department,
University of North Carolina and SAMSI
.
The Support Vector Machine is a discrimination method that was developed in the machine learning community. Statistical ideas are used to improve it in the important context of High Dimension - Low Sample Size data, resulting in a new method called Distance Weighted Discrimination. The ideas are illustrated with some examples from micro-array analysis. Some unexpected behavior is explained using a non-standard asymptotic analysis as the dimension tends to infinity.
DATE/PLACE: Thursday, January 15, 2004, 2:00pm
Math 103
TYPE: Statistics Seminar joint with Probability seminar
TITLE: Is there total inference? Saddlepoint and beyond
SPEAKER: Dr. D.A.S. Fraser,
Department of Statistics,
University of Toronto
.

Saddlepoint theory provides highly accurate approximations for density and distribution functions using an available cumulant generating function. The density appproximation was introduced to statistics by Daniels in 1954 and given broader context in Barndorff-Nielsen and Cox in 1979. But the greater implications for statistics only emerged with the distribution function approximation by Lugannani and Rice in 1980. Research since then offers substantial implications for statistical theory. These recent results will be surveyed and some presently apparent directions indicated.

The saddlepoint is available in contexts where a cumulant generating function is accessible. But the high accuracy is available in large generality. The stand in for the cumulant generating function is an observed likelihood function provided its argument has been appropriately calibrated; the calibration involves only the locally determined form of the likelihood function near the observed data.

The availability of an accurate approximation for a probability is of interest, but the real importance comes from the related asymptotic theory that provides a widely unique decomposition of a statistical model and the determination of a p-value free of nuisance parameters. These recent statistical methods will be reviewed together with examples.

DATE/PLACE: Thursday, January 15, 2004, 4:00pm
Leonard S. Klinck 301
6356 Agricultural Road, UBC
TYPE: Statistics Seminar / BRG Seminar
TITLE: Distance Weighted Discrimination and Geometrical Representation of High Dimension - Low Sample Size data
SPEAKER: Dr. J.S. Marron,
Statistics Department,
University of North Carolina and SAMSI
.
The Support Vector Machine is a discrimination method that was developed in the machine learning community. Statistical ideas are used to improve it in the important context of High Dimension - Low Sample Size data, resulting in a new method called Distance Weighted Discrimination. The ideas are illustrated with some examples from micro-array analysis. Some unexpected behavior is explained using a non-standard asymptotic analysis as the dimension tends to infinity.

 

a place of mind, The University of British Columbia

Department of Statistics

Department of Statistics, University of British Columbia
3182 Earth Sciences Building
2207 Main Mall
Vancouver, BC, Canada V6T 1Z4
Tel: 604.822.0570
Fax: 604.822.6960

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia