Seminars

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 30th November 2006
4:00pm
Department of Statistics, UBC
An Overview of Missing Data Problems in Mixed Models
Show Abstract

BRG Mixed Models Workshop #3. See also Robert Prosser and Harry Joe

In this talk, I will give an overview of missing data problems in mixed effects (random effects) models. The missing data problems include missing covariates, missing responses, dropouts, measurement errors, and censoring. The missing data mechanisms include ignorable and nonignorable missing. The mixed models include linear and nonlinear mixed-effects models, generalized linear mixed models, and survival models with random effects (frailty models). Missing data methods include EM algorithms and multiple imputation. I will discuss computational difficulties in these problems and approximate methods.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th November 2006
4:00pm
Gabriela V. Cohen Freue
University of British Columbia
A Robust Instrumental Variables Estimator
Show Abstract
We consider the problem of estimation in a linear model when some covariates, called endogenous covariates, are correlated with the error term. In this case, ordinary least squares estimators (OLS) are inconsistent for the regression parameters. A common approach to address this problem is to use additional variables, called “instrumental variables”, to construct consistent instrumental variables estimators analogous to OLS. Despite their widespread use, the ordinary instrumental variables estimators and its most efficient version, known as two-stage least squares estimator (2SLS), are highly sensitive to outliers in the response variable, the covariates, and the instruments. In this paper, we propose a robust instrumental variables estimator (RIV) based on a high breakdown point S-estimate of multivariate location and scatter. RIV has bounded influence function and high breakdown point. Moreover, RIV is consistent under weak distributional assumptions and asymptotically normal under certain regularity conditions. We derive its asymptotic variance matrix and use it to calculate standard errors for the estimated regression coefficients. In addition, RIV is computationally inexpensive and provides a natural robustification for the ordinary instrumental variables estimator. We illustrate the performance of RIV and a proposed diagnostic tool using two real datasets and an extensive simulation study under clean and contaminated datasets.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 21st November 2006
4:00pm
University of Victoria Department of Mathematics and Statistics
Symmetry of minimax regression designs
Show Abstract
Robust designs against misspecification of the regression response will be introduced and discussed using a minimax approach. Several design criteria such as A-optimal, D-optimal and Q-optimal can be used to construct minimax designs, which are found to have simple density functions on the design space. An important issue about minimax designs is the symmetry of density functions. If a density function is symmetric, then the computation for the density function can be simplified. We can show that there exists symmetric D-optimal minimax design if the design space is symmetric. Examples of minimax designs will be given. The symmetry of A-optimal minimax design or Q-optimal minimax design is not solved yet.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 16th November 2006
4:00pm
Robert Prosser
Pharmaceutical Outcomes and Policy Innovation (POPi) research unit, Pharmaceutical Sciences UBC
Trends and Regional Variation in Asthma Medication Use in BC (1996-2001)
Show Abstract
This talk is a part of the Mixed Effects Models Workshop. See also a presentation by Harry Joe on Nov 2.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 14th November 2006
4:00pm
Bioinformatics Centre University of British Columbia
Large-scale data mining of gene expression patterns for functional discovery
Show Abstract
A major role of bioinformatics is exploiting the accumulated wealth of high-throughput genomics data. While methods are well-established for analyzing large DNA databases, analyzing microarray gene expression profiles in large databases remains a challenge. These data are often analyzed individually, published, and then stored in archives that do not facilitate further exploration across data sets. I will describe work in my group in which we are applying meta-analytical approaches to this problem. Our focus is on developing rapid methods that can provide "on demand" analyses in response to queries about specific genes constrained to selected microarray data sets. I will also discuss how we are applying our tools to analysis of gene function.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 2nd November 2006
4:00pm
Dept. of Statistics, UBC
Computing methods in generalized linear and non-linear mixed models
Show Abstract

This talk is a part of the Mixed Effects Models Workshop. See also Nov 16 presentation by Robert Prosser.

Generalized linear models and non-linear Gaussian models with random effects are reasonable models for clustered/longitudinal data to account for between cluster variability and within cluster dependence. Simple examples include logistic regression and Poisson regression with intercept that is random over clusters, and nonlinear regression models with coefficients that are random over clusters.

There are several recent books on random effects in generalized linear models and non-linear Gaussian models, but they have different recommendations on the computing methods and approximations for maximum likelihood estimation. I think there has been insufficient rigor in the comparison of various approximations for the multidimensional integrals in the likelihood. In this talk, I will give an introduction to the methods that can be considered: Gauss-Hermite quadrature, the Laplace approximation, Monte Carlo EM, bivariate composite and their relatives. I will explain my view of Lee and Nelder's h-likelihood for GLMMs. Availability of statistical software and code will be discussed.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 31st October 2006
4:00pm
Biostatistics Johns Hopkins University
On Missing Data and Interactions in SNP Association Studies
Show Abstract
In this presentation we discuss possible solutions for two common problems in SNP association studies: the presence of missing data in the covariates, and the search and evaluation of models allowing for higher order SNP-SNP and SNP-environment interactions. The majority of SNP association studies are based on data with missing genotype information. The most common approach for dealing with those missing data is to omit the observations that have missing records in the model's covariates. This approach however can have severe shortcomings for the statistical inference, namely a potential bias in the parameter estimates, and the loss of power. The latter can be overwhelming especially when SNP-SNP interactions are considered. In this presentation we show some examples that illustrate the shortcomings of omitting observations, and compare some methods to address the missing data issue. In particular, we propose a novel tree-based imputation algorithm as a solution, and demonstrate how this approach can be used to draw valid statistical inference in the search for and assessment of SNP-SNP interactions, using the Logic regression methodology.
Statistics
Rooms 1200-1500, SFU Segal Graduate School of Business, 500 Granville St
Thu 26th October 2006
4:00pm
Statistics Iowa State University
van Eeden Lecture: Nonparametric Variance Estimation for Systematic Samples
Show Abstract
Systematic sampling is a frequently used sampling method in surveys, because of its ease of implementation and its design efficiency.  An important drawback of systematic sampling, however, is that no direct estimator of the design variance is available.  We describe a new estimator of the model-based expectation of the design variance, under a nonparametric model for the population. The nonparametric model is sufficiently flexible that it can be expected to hold at least approximately for many practical situations.  We prove the consistency of the estimator for both the anticipated variance and the design variance under the nonparametric model.  The approach is used on a forest survey dataset, on which we compare a number of design-based and model-based variance estimators.

This talk is funded by the Constance van Eeden Fund through UBC and by MITACS: Mathematics of Information Technology and Complex Systems through SFU.
Statistics
LSK 460, 6356 Agricultural Road, UBC
Tue 24th October 2006
4:00pm
Statistics Iowa State University
van Eeden Lecture: Sampling Design and Estimation for Natural Resource Surveys
Show Abstract
Natural resources surveys often have unique characteristics that differentiate them from non-survey studies of natural resources, and also from human population surveys.  In this talk, general design and estimation principles for statistical surveys of natural resources will be reviewed, and some modern developments will be discussed. Three large-scale surveys in the US will be used to illustrate the concepts: the National Resources Inventory, the Forest Inventory and Analysis and the National Stream Assessment.
Statistics
LSK 460, 6356 Agricultural Road, UBC
Mon 23rd October 2006
3:30pm
Statistics Department, University of Georgia
Inference using Shape-Restricted Regression Splines
Show Abstract

Nonparametric function estimation is appropriate when a parametric form is unknown. In practice, researchers prefer to use parametric models, because parameters are interpretable and useful inference procedures are available in statistical software packages. However, often the only valid assumptions are qualitative in nature: the expected value of the response must be increasing with the predictor variable; the growth curve must be increasing and concave; the time series trend must be decreasing. Perhaps the function can also be assumed to be smooth.

Hypothesis testing procedures where the null hypothesis is the convenient parametric form and the alternative hypothesis encompasses only the known qualtitative assumptions are therefore useful in practice. In this talk methods are presented in which the alternative hypothesis involves assumptions about shape and smoothness. Shape-restricted regression splines are used for the fits to the data because unlike the unrestricted versions, they are robust to knot choices. Exact tests for constant versus increasing and linear versus convex regression functions are presented, as well as a practical test for linear versus increasing regression function. The methods are applied to several real-world datasets.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 17th October 2006
4:00pm
Statistics Simon Fraser University
Factorial designs with multiple stages of randomization
Show Abstract
The design and analysis of (fractional) factorial experiments with randomization restrictions has received considerable attention in recent years; motivated by studies of multi-stage processes or systems. These endeavors have given rise to seemingly unrelated methods of design construction, specific design to the layout (e.g., split-plot, split-lot, strip-plot designs). In this talk, a general approach to the design of factorial experiments with randomization restrictions is presented. The construction includes most approaches in the literature as special cases, and is easily adaptable to designs which are combinations of different layouts. The proposed methodology is illustrated on a plutonium alloy manufacturing process.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 3rd October 2006
4:00pm
Computer Science University of British Columbia
On Probabilistic Response Surface Methods for POMPDPs and Active Learning
Show Abstract
I will begin by introducing a popular probabilistic response surface method for Bayesian experimental design. I will, subsequently, illustrate how the method can be combined with simulation techniques in order to learn optimal policies for partially observed Markov decision processes (POMDPS).The emphasis will be on illustrating the technique with examples, such as stochastic planning for mobile robots, sensor network management and learning in video (or serious) games. I will also discuss an application of the method to human-computer interfaces and, in particular, focus on the domain of designing tools for animators. I will finish the talk by introducing some open problems that need to be attacked in order to improve the methodology and theory in this research area.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 14th September 2006
4:00pm
Dr. Markus Abt
F. Hoffmann La-Roche AG in Basel, Switzerland
Three-Arm Non-Inferiority Studies in Oncology: Testing for Superiority and Effect Retention
Show Abstract

Non-inferiority studies aim at showing that a new experimental therapy is at least as efficacious as the currently available standard of care. When planning the study, the choice of the non-inferiority margin is most critical. Taking into account the benefit the current standard of care has shown in historic studies has been suggested and leads to the concept of effect retention. The principles behind this as well as the implications for the sample size are discussed during the first part of the presentation.

In oncology, combinations of two or more treatments are quite frequent. For example, it might be suspected that a combination AB of two agents A and B is superior to treatment with the current standard A alone. On the other hand, a third therapy X is considered to be non-inferior to A with regard to efficacy while at the same time offering advantages with regard to other endpoints. The combination XB could therefore be a candidate warranting further development. A two-arm study directly designed to show superiority of XB to A might however not be acceptable from a regulatory perspective. Three-arm studies attempting at showing superiority of the combination AB to A as a first and non-inferiority of the combination XB to AB in a second step address both questions, superiority of AB to A as well as non-inferiority of X to A. We discuss the design of such studies considering multiple power as well as alpha adjustments and compare this approach to the alternative of conducting two independent studies designed to show superiority of AB to A first and non-inferiority of XB to AB second.

Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 17th August 2006
4:00pm
Dr. Philip Brown
Institute of Mathematics, Statistics & Actuarial Science, University of Kent, UK
Multiple modes in high dimensional regression settings
Show Abstract

We adopt a Bayesian approach with priors for the regression coefficients that are scale mixtures of normal distributions and embody a high prior probability of proximity to zero. By seeking modal estimates we generalise the lasso in a way that provides automated variable selection and is adaptive through a range of models from ridge to lasso to quasi-Cauchy through extreme spiked versions to the limiting  normal-Jeffreys.

 

Properties of the priors and their resultant posteriors are explored in the context of the linear and generalised linear model especially when there are more variables than observations.  We develop MAP algorithms that embrace the need to explore the multiple modes of the non log-concave posterior distributions using multiple simulated perfectly fitting starting values.  In the context of more variables than observations this multimodality is argued to be a  desirable aspect of inference. The methodology is illustrated with a simulation study and a proteomic example. This work extends and further develops Griffin and Brown (2005).

 

Abstract in pdf  format

 Reference:

Griffin, JE and Brown, PJ, ``Alternative prior distributions for variable selection with very many more variables than observations'', UKent Technical report, 2005.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 15th August 2006
4:00pm
Dr. Feifang Hu
Department of Statistics, University of Virginia
Adaptive Randomizations in Clinical Trials
Show Abstract

While clinical trials may provide information on new treatments that can impact countless lives in the future, the act of randomization means that volunteers in the clinical trial will receive the benefit of the new treatment only by chance. In most clinical trials, an attempt is made to balance the treatment assignments equally, thus the probability that a volunteer will receive the potentially better treatment is only 50%. Response-adaptive randomization uses accruing data to skew the allocation probabilities to favor the treatment performing better thus far in the trial, thereby mitigating the problem to some degree.

In this talk, I give a brief review of adaptive randomizations. Then I propose some new response-adaptive randomization procedures that have some desirable properties. The resulting randomization procedures provide efficient methods to determine whether a new treatment is effective in a clinical trial, while simultaneously minimizing a clinical trial volunteer's chance of being assigned to the inferior treatment.

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 10th August 2006
4:00pm
Department of Biostatistics and Computational Biology, University of Rochester.
Modeling Longitudinal Dynamic Systems with Applications to Long-Term HIV Dynamics
Show Abstract
Longitudinal data are collected from a number of subjects along the time in biomedical longitudinal studies. In some biomedical applications, the biological mechanisms are well studied and the mathematical representations of the biomedical systems are available. The questions that we are concerned include (1) how to estimate the parameters in the longitudinal dynamic systems which are usually described by a set of differential or difference equations; (2) how to forecast the future outcomes for both individual subjects and for the whole population using the identified models; and (3) how to "borrow the strength" across the subjects under the setup of longitudinal dynamic systems. In this talk, I will present two different models for HIV dynamic systems from AIDS clinical trials. One model is a deterministic model with a set of differential equations and another model is a state-space model. In both models, we consider the important features of longitudinal data such as within-subject variation and between-subject variation as well as within-subject correlations. The hierarchical mixed-effects modeling idea is used in both models. The hierarchical Bayesian approach is proposed to estimate the parameters in the deterministic dynamic models and several methods such as a two-stage method, MLE and Bayesian approach are studied under the state-space model setup. Applications to AIDS clinical data will be presented to illustrate the methodologies. Some open questions in this area will be posed.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 4th August 2006
11:00am
Dr. Sam Weerahandi
Director of Analysis, Time/Warner RSM Research
Web-based Analytics for Decision Making
Show Abstract
Publishing of Analytics via Splus Server or SAS Server not only can provide greater visibility and increased application of your work, but also can substantially increase the productivity and efficiency of users. This presentation provides a discussion of a class of market analytics ranging from simple cross tabulations to advanced statistical analysis. Demo of some analytics that won an Innovation Award from Insightful will be given. A discussion of executive dashboards followed by a demo of a sample dashboard will also be given.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 6th June 2006
4:00pm
Faculty of Psychology, University of Barcelona
Testing models for multivariate categorical data: Implications for IRT research
Show Abstract

The talk reviews recent developments in the area of goodness-of-fit testing of multivariate categorical data models. It begins by reviewing the classical statistics: Pearson's X2 and the likelihood ratio test statistic G2. Since the asymptotic p-values for these statistics are inaccurate when the contingency table is sparse, we discuss alternatives: testing solely for relative fit using the likelihood ratio statistic, pooling cells, resampling methods, and limited information methods.

Limited information test statistics can be derived for either full information null hypotheses or for limited information null hypotheses. We consider both cases. Should an overall goodness-of-fit test indicate that the model fits poorly it is necessary to assess the source of misfit. We discuss testing the fit of the model in subtables. That is, tests for single variables, pairs of variables, and triplets.

Also, it is sometimes of interest to investigate the extent to which the model cross-validates in holdout samples. We present recent developments in this area as well.

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 18th May 2006
4:00pm
Statistics Department, Oregon State University
Unbiased and efficient estimation functions for correlated data with missing at random
Show Abstract
We develop a consistent and highly efficient marginal model for missing at random data using an estimating function approach. Our approach differs from Robins et al.'s (1995) weighted estimating equations and the imputation method, in that our approach does not require knowing the missing mechanism, and does not require estimating the probability of missing or the missing response based on an assumed model. Our method is based on an aggregate unbiased estimating function approach, which is equivalent to the score method if the likelihood is known. The inverse weighting estimating function method is based on a pattern-unbiased equation. The aggregate-unbiased approach requires a weaker criterion than the pattern-unbiased approach. Therefore, the most efficient estimating function based on aggregate unbiasedness is more efficient than the pattern-unbiased one. We show how to generate unbiased and efficient estimating functions based on aggregate unbiasedness and the necessary conditions for our approach. Simulations and an HIV data example will be illustrated for a comparison of the three approaches. This is joint work with Bruce Lindsay and Lin Lu.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 28th April 2006
4:00pm
Department of Mathematics and Statistics, McGill University.
Correlation random fields, brain connectivity, and cosmology
Show Abstract
We are all familiar with the correlation coefficient between two sets of numbers. Now suppose we replace the numbers by images in any number of dimensions. The correlation random field is the 'image' of correlations at all possible pairs of points in the two images. We are interested in the topology of the (random) set of high correlations, more specifically, the Euler characteristic (EC). Strangely enough the statistical properties of the EC can be used to detect connectivity in the images, that is, regions of high correlation. We apply this idea to resting state fMRI images of brain activity, brain damage due to non-missile trauma, and to connections between MS lesions and cortical thickness. The same methods are used in cosmology to look for large-scale structure in the universe, and anomalies in the cosmic microwave background from the big bang. We apply this to the latest results from the Sloan Digital Sky Survey and the Wilkinson Microwave Anisotropy Probe.
BRG
Chan Auditorium at CMMT/BCRI, 950 W 28th Ave, Vancouver
Fri 31st March 2006
1:00pm
Special session
Show Abstract

1:30-2:30 Applications of Linear and Non-linear Mixed Models in Health Research

Dr. Rollin Brant
Department of Statistics, University of British Columbia
Centre for Child Health Research, Child and Family Research Institute

Some of my more interesting and challenging joint projects have involved the use of mixed-effect non-linear models. These models requires a lot of thought about parameterisation, a lot of computational effort and care, and at the end, a lot of care in interpreting estimates.

The first project began with data collected by Dr. Jacek Kopec, an expert in salumetrics at the Arthritis Research Centre of Canada the Arthritis Foundation, who collected data from 601 subjects using a complex questionnaire aimed at understanding how people valued states of health in accordance with degree and type of disability. 227 possible health states were described in a standardised manner in terms of vision, ambulation, dexterity, emotion, cognition and pain. Each respondent evaluated a selection of 35 of these health states on a 100 point "Health Thermometer" resulting in 19,232 (usable) valuations. We applied linear and non-linear mixed effects regression to preference and derived utility scores and made comparisons with the findings of previous investigators following a different experimental design and analysis plan.

The second project involved Dr. Anton Miller, a pediatrician and scientist at the Child and Family Research Institute. Dr. Miller and collaborators obtained data on health care services utilisation on 3271 children who had been prescribed ritalin, to see if utilisation declined after beginning ritalin-therapy, and if it increased again after cessation of therapy. The condensed nature of the data provided per each child precluded the application of standard event-rate models, leading to fitting a non-linear, over-dispersed, correlated count model with some surprising results.

3:00-4:00 Link-tracing designs for studies of hidden human populations

Dr. Steve Thompson, SFU

Hidden human populations such as injection drug users, commercial sex workers, and others at risk for HIV/AIDS are difficult to reach by conventional sampling methods. Link-tracing designs, in which social links are followed from sample respondents to find more members of the hidden population to add to the sample often provide the only practical way to obtain a sample with enough people for the study. At face value, such samples are not necessarily representative of the larger at-risk population of interest, because highly linked people are more likely to be included in the sample while people with few social connections tend to be underrepresented.

In this talk I'll give an overview of traditional link-tracing designs such as snowball sampling and random walk designs and describe new designs including targeted walks and adaptive web sampling. Design unbiased and Bayes inference methods for estimating characteristics of the hidden population from the samples will be described as well.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 21st March 2006
4:00pm
Department of Statistics & Computer Science, UBC
Algorithms and applications for Dynamic Bayesian networks
Show Abstract
Dynamic Bayesian Networks (DBNs) are a compact representation of structured stochastic dynamical systems. Some well-known special cases include hidden Markov models (HMMs) and linear dynamical systems (LDSs, also called state space models). In this talk, I will give a few samples of the work I have done in this area, ranging from applications to algorithms. The first application is concerned with figuring out where you are in the environment using a head-mounted camera. The second application is concerned with segmentation and classification of array CGH (comparative genome hybridization) data. On the algorithms front, I will describe how to extend the structural EM algorithm to learn the topology of a network from time series data. Finally, I will describe how to tradeoff space for time when performing offline smoothing in large, discrete-state models (this turns out to be an important subroutine in the SEM algorithm).
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 9th February 2006
4:00pm
Dr. Penny Brasher
Centre for Clinical Epidemiology & Evaluation, Vancouver General Hospital
The Trouble with Surrogate Endpoints
Show Abstract
Surrogate endpoints are often employed in clinical trials to reduce the duration and/or size of a trial in comparison to using the clinical endpoint of real interest. Prentice (Statistics in Medicine 1989) developed criteria to validate surrogate end points in phase 3 trials, criteria that, in my experience, are seldom employed by the medical community. In this talk I will review the issues associated with the use of surrogate endpoints and some of the statistical contributions that have been made since Prentice's original paper. Finally, I will describe my experiences from a randomized trial of external beam radiotherapy versus cryoablation in the treatment of localized prostate cancer, a trial that employed PSA (prostate-specific antigen) as a surrogate endpoint.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 2nd February 2006
4:00pm
Department of Statistics, UBC
A Novel Meta-analysis of Disparate Datasets in Stem Cell Culture
Show Abstract
The recent emergence of Microarrays allow for the gene expression levels of thousands of genes to be captured at once. The definition of what is considered to be an interesting gene within an experiment can often be more complex than simply requiring differential expression across two conditions, populations or treatments. With these complex definitions, p-values are often an index of significance that is too focused on single parameters or simple tests involving multiple parameters. We present an index of significance for the statistical certainty that a gene meets our definition of interesting within an experiment, which can be estimated using a semi-parametric bootstrap.

Also, given the rapid production of diverse microarray datasets, it would be useful to combine results from different experiments. However, it is often the case that different statistical models are used, therefore giving rise to distinct definitions of `interesting' within each experiment. In general, this precludes many conventional forms of meta-analysis that aim to directly combine estimates or p-values. We present a novel form of meta-analysis, whereby experiments of differing structures or designs may be more easily combined using the chosen index of significance, to select an optimal set of interesting genes across disparate datasets.

a place of mind, The University of British Columbia

Department of Statistics

Department of Statistics, University of British Columbia
3182 Earth Sciences Building
2207 Main Mall
Vancouver, BC, Canada V6T 1Z4
Tel: 604.822.0570
Fax: 604.822.6960

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia