Seminars
Statistics
The Diamond Family Theatre, BC Cancer Research Center
Fri 7th December 2007
12:30pm
Jiahua Chen (UBC) and Thomas M. Loughin (SFU)
Joint UBC/SFU/BRG seminar
Show Abstract
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 4th December 2007
11:00am
Institute of Statistical Mathematics
4-6-7 Minami-azabu, Minato-ku,
Tokyo 106-8569 Japan
Measuring conditional dependence with positive definite kernels
Show Abstract
We propose a new measure of conditional dependence of random variables,
based on normalized cross-covariance operators on reproducing kernel Hilbert
spaces. Unlike previous dependence measures with positive definite kernels,
the proposed criterion does not depend on the choice of kernel in
population, for a wide class of kernels. At the same time, it has a
straightforward empirical estimate which is consistent. In the special case
of unconditional dependence, the measure is exactly the same as the mean
square contingency, which is one of the popular measures of dependence. We
discuss the theoretical properties of the measure, and demonstrate its
application in experiments.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 29th November 2007
4:00pm
Jennifer Clarke
Department of Epidemiology and Public Health
University of Miami School of Medicine, USA
An Ensemble Approach to Improved Prediction from Multitype Data
Show Abstract
We have developed a strategy for the analysis of newly available binary data to improve outcome predictions based on existing data (binary or non-binary). Our strategy involves two modeling approaches for the newly available data, one combining binary covariate selection via LASSO with logistic regression and one based on logic trees. The results of these models are then compared to the results of a model based on existing data with the objective of combining model results to achieve the most accurate predictions. The combination of model predictions is aided by the use of support vector machines to identify subspaces of the covariate space in which specific models lead to successful predictions. We demonstrate our approach in the analysis of single nucleotide polymorphism (SNP) data and traditional clinical risk factors for the prediction of coronary heart disease.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 27th November 2007
11:00am
Statistics Department
UBC
Digesting Model Outputs: a Predictivist View
Show Abstract
The standard approach to reporting the results of model estimation is typified by the table of "regression output" that results from fitting logistic, survival, and a variety of related models. The emphasis in these tables on parameter estimates, standard errors, etc. has been criticized (if not condemned) as marginal to the primary aims of statistic by adherents of the predictive approach. By contrast, predictivists stress the importance of framing inferences in terms that most directly relate to "observables" - for example, fitted survival curves in the case of survival analysis. In this talk I’ll describe and compare subject specific and population averaged implementations of the predictive approach in regression-type models.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 22nd November 2007
4:00pm
Yinshan Zhao
Research Associate
MS/MRI research group
Department of Medicine, UBC
Statistical analyses of MRI data from patients with multiple sclerosis
Show Abstract
Multiple sclerosis (MS) is a chronic degenerative disease of the central nervous system. The neuropathological hallmark is the presence of multifocal region of inflammatory demyelination. Magnetic resonance imaging (MRI) provides a window on the brain and spinal cord, enabling identification of inflammatory activities (lesions). MRI measurements commonly used in MS include lesion volume, lesion counts, brain volume fracture. I will present three studies with MRI data from clinical trails in MS: (1) validating MRI measurements as biomarkers of clinical outcomes; (2) assessing agreement of lesion counts obtained at two different occasions. (3) assessing short term change in monthly gadolinium enhancing lesion activity. For each study, I will outline the analysis methods and address some the statistical issues.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 20th November 2007
11:00am
Department of Statistics and Actuarial Science
University of Waterloo
Waterloo, Ontario
CANADA N2L 3G1
Kernel-Induced Classification Trees and Random Forests
Show Abstract
Motivated by the success of support vector machine (SVM), a recursive-partitioning procedure using kernel functions is proposed for classification problems. We call it KICT- kernel-induced classification trees. Essentially, KICT uses kernel functions to construct CART models. The resulting model could perform significantly better in classification than the original CART model in many situations, especially when the pattern of the data is non-linear. We also introduce KIRF: kernel-induced random forests. KIRF compares favorably to random forests and SVM in many situations. KICT and KIRF also largely retain the computational advantage of CART and random forests, respectively, in contrast to SVM. We use simulated and real world data to illustrate their performances. We conclude that the proposed methods are useful alternatives and competitors to CART, random forests, and SVM.
Keywords: classification tree, feature space, kernel function, random forest, split rule, support vector machine
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 13th November 2007
11:00am
Dipartimento di Statistica
Università Cà Foscari di Venezia
San Giobbe, Cannaregio 873
30121 Venezia (ITALIA)
Local Depth
Show Abstract
Data depth is a general framework for nonparametric analysis of multivariate numerical data. For a given probability distribution F on the euclidean space with p dimension, the depth function d(x;F) measures the centrality of each point x with respect to F. Applications (Liu, Parelius and Singh, 1999) include ordering sample data according to depth ranks, estimation of the center and of generalized quantile surfaces based on depth contours, investigation of the shape of $F$ through graphical displays (sunburst plot, DD plot, scale curve).
The present research centers on local features of a probability distribution, such as multimodality, contamination and mixtures or, in the sample case, clustering and stratification. Our approach is based on simplicial depth.
Let T(F) be a scalar functional of the random simplex S(F;p+1) able to describe its size, such as the diameter or the volume. For any 0<1, the q-local depth of x is ld(x,q;F)=Pr(x belongs to S(F;p+1) and T(F) < t(q), where t(q) is the q% quantile value of T(F). When q goes toward 0, it measures the centrality of x within a family of neighbourhoods with infinitesimal radius, whereas, when q goes toward 1, it measures the global centrality of x, without restrictions on the neighbourhood radius. Several applications of local depth can be envisaged, first of all to clustering and scaling problems.
Practical implementation of local depth requires efficient algorithms to check membership of points in a simplex and to compute diameter or volume. Both problems are non-trivial with the usual sample sizes and high dimension. We refer to Miklos 2003 for a recent review of existing algorithms.
In this talk after introducing the general definition I present results regarding the properties of the local depth in the univariate case. Examples from univariate, bivariate and circular datasets are illustrated.
References
- Liu, R.Y., Parelius, J.M., Singh, K.: Multivariate analysis by data depth: Descriptive statistics, graphics and inference. The Annals of Statistics 27 (1999) 783—858
- Simonovits, M.: How to compute the volume in high dimension? Mathematical Programming B 97 (2003) 337—374
Social Lounge at St. John's College, 2111 Lower Mall, UBC
Fri 9th November 2007
4:00pm
Professor, Department of Statistics, Rutgers University
PNW Talk 2: Frequentist Coverage of Bayes Credible Regions
Show Abstract
Presentation slides are available for download.
For estimating a positive normal mean, Zhang and Woodroofe (2003) as well as Roe and Woodroofe (2000) investigate 100(1-
α% HPD credible sets associated with priors obtained as the truncation of noninformative priors onto the restricted parameter space. Namely, they establish the attractive lower bound of (1-
α/(1+
α) for the frequentist coverage probability of these procedures. In this work, we establish that the lower bound of (1-
α)/(1+
α) is applicable for a substantially more general setting with underlying distributional symmetry, and present various illustrations and related properties. Investigations of non-symmetric models are carried out and similar results are obtained.
This talk is a part of
PNW Statistics Meeting.
Social Lounge at St. John's College, 2111 Lower Mall, UBC
Fri 9th November 2007
2:30pm
Research Scientist, Avaya Labs Research
PNW Talk 1: Hunting for the Root Cause of Robotic Voice
Show Abstract
Presentation slides are available for download.
This talk is the story of an investigation into an industrial problem. Users of VoIP (Voice over IP) were complaining about some intermittent episodes of robotic voice. End-to-end VoIP and network routing data was collected between endpoints disposed throughout the network and a pair of routers were identified as the root cause of the problem. This exciting investigation into a typical industrial problem requires networking and statistical considerations that I will describe in more details.
This talk is a part of
PNW Statistics Meeting.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 30th October 2007
11:00am
Department of Statistics and Actuarial Science
Simon Fraser University
Burnaby, BC, Canada
Statistical Inference for Dynamic models with the Generalized Profiling Method
Show Abstract
Dynamic models, usually written in forms of differential equations (DEs), describe the rate of change of a process. They are widely used in medicine, engineering, ecology and a host of other applications. One central and difficult problem is how to estimate DE parameters from noisy data. We have developed the generalized profiling method to solve this problem. DE solutions are approximated by nonparametric functions, which are estimated by penalized smoothing with DE-defined penalty. The computation is much faster than other methods. A modified delta method is proposed to estimate variances of DE parameters, which include all the uncertainty of the smoothing process. I will demonstrate our method with estimating a predator-prey dynamic model and gene regulatory networks. The generalized profiling method can also be used to estimate other statistical models with nuisance parameters.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 23rd October 2007
11:00am
University of British Columbia
Computer Science Building
2366 Main Mall
Vancouver BC V6T 1Z4
Generalized Polya Urn for Time-varying Dirichlet Process Mixtures
Show Abstract
Dirichlet Process Mixtures (DPMs) are a popular class of statistical models to perform density estimation and clustering. However, when the data available have a distribution evolving over time, such models are inadequate. We introduce here a class of time-varying DPMs which ensures that at each time step the random distribution follows a DPM model. Our model relies on an intuitive and simple generalized Polya urn scheme. Inference is performed using Markov chain Monte Carlo and Sequential Monte Carlo. The model is demonstrated on various applications.
BRG
James Mather Building, Room 253 in HECP
Thu 18th October 2007
4:00pm
Department of Statistics, UBC
Mixed Effects Models in AIDS Studies with Missing Data
Show Abstract
In AIDS studies, mixed-effects (or random-effects) models are useful for analyzing longitudinal data or survival data. In mixed-effects models, correlation within clusters or repeated measurements are incorporated through random effects, which also allow individual-specific inference as well as population inference. Commonly used mixed models include linear mixed models, generalized linear mixed models, nonlinear mixed-effects models, and frailty models. Missing data and measurement errors are very common in AIDS studies. For example, CD4 and viral loads may be measured with errors and patients may drop out of the study early. Ignoring missing data and measurement errors or use of naive methods may lead to misleading results. We consider appropriate statistical methods for mixed effects models with incompletely observed data. We illustrate the methods using HIV viral dynamic models.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th October 2007
11:00am
Professor Youngjo Lee
Department of Statistics
Seoul National University, Korea
HGLM Analysis for Disease Mapping
Show Abstract
In disease mapping, Bayesian methods have been widely used together
with software such as WinBUGS. Now the hierarchical(or h)-likelihood methods
allow reliable likelihood inferences for spatial models. We compare the
h-likelihood with Bayesian estimates obtained from software such as WinBUGS. Inferences
from both Bayesian and likelihood methods are similar. However, Bayesian
estimates
are affected by the choice of prior, while the likelihood estimates do
not. We
study inferences about both fixed and random parameter estimations. For
standard error estimates of relative risks a penalized quasi-likelihood
(PQL)
method has been developed, which accounts for the variability in the
estimation
of the hyperparameters. In this talk, we show how the hierarchical
likelihood
procedure accounts for the inflation of standard error estimates caused by
uncertainty in the estimation of fixed parameters. Comparison is made with
the
prediction intervals from the PQL and Bayesian methods. By simulation
studies
we show that the proposed our interval for random parameters maintains the
required level.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 2nd October 2007
11:00am
Department of Forest Sciences
University of British Columbia
2404 Main Mall,Vancouver BC V6T1Z4
Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization
Show Abstract
A new method is developed for calculating sequence substitution
probabilities using Markov chain Monte Carlo (MCMC) methods. The basic
strategy is to use uniformization to transform the original continuous
time Markov process into a Poisson substitution process and a discrete
Markov chain of state transitions. An efficient MCMC algorithm for
evaluating substitution probabilities by this approach using a
continuous gamma distribution to model site-specific rates is
outlined. The method is applied to the problem of inferring branch
lengths and site-specific rates from nucleotide sequences under a
general time reversible (GTR) model.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 25th September 2007
11:00am
Department of Statistics
UBC
Practical Issues in Monte Carlo Integration
Show Abstract
Computing marginal likelihoods to perform Bayesian model selection is
often a challenging task. During this presentation I will discuss the
importance of appropriately fine-tune Monte Carlo integration methods to
estimate marginal likelihoods. I will focus more particularly on path
sampling, which is recognized as one of the most powerful methods for this
purpose. I will begin by showing the potentially very influential impact
of two tuning parameters of path sampling: the choice of the importance
density and the specification of the grid. I will then provide practical
suggestions to select them.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th August 2007
3:00pm
Bayesian Adjustment for Exposure Misclassification in Case-Control Studies
Show Abstract
Measurement error on the explanatory variable occurs frequently in observational studies. Error-prone observations may lead to biased estimation and loss of power in detecting the impact of the exposure variable on the response. Lack of knowledge of the mechanism of measurement error can bring difficulties to the adjustment for mismeasurement. In this project, we consider situations whit a correctly specified binary response, and a misclassified binary exposure. We propose Bayesian adjustment to correct for measurement error subject to varying differentiality. Exposure prevalences and misclassification parameters are assigned prior distributions. Internal validation data are utilized to insure the resulting model is identifiable. We show that the Bayesian and MLE models produce accurate and similar estimates of the association under differential and nondifferential misclassification. In addition, Bayesian methods are particularly useful in the face of uncertainty about whether exposure misclassification is differential.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th August 2007
3:00pm
MSc (2007) Department of Statistics, UBC
Normalization for Quantitative Phenotypic Studies in Genome-Wide Gene Deletion and Inhibition
Show Abstract
High throughput phenotypic experiments include both deletion sets and RNAi experiments. They are genome wide and require much physical space. As a result, multiple plates are often required in order to cover the whole genome. The use of multiple plates leads to systematic plate-wise experimental artefact, which impede statistical inference. We review current pre-processing methodology. Their fundamental principle is to align a common feature shared by all plates. From this very principle, we propose an improved method which simultaneously estimates all parameters required for the pre-processing transformation.
Some of the alignment features popular today implicitly assume conditions which are often not met in practice. We discuss the various choices of features to align. Specifically, the upper quantiles and the mean of the left tail trimmings of each plate's data distribution are features which are always available and simple to obtain. Moreover, they are robust to non-randomization of genes to plates. Their use will be motivated through simulation and applied to real data.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 9th August 2007
3:00pm
Jesse Raffa
PhD Candidate, University of Waterloo
Longitudinal Analyses of Medication Adherence Data in HIV-Infected Illicit Drug Users
Show Abstract
Adherence to therapy (the fraction of medication doses taken correctly) is factor that is extremely important for successful therapeutic outcomes in many settings, but is particularly important for HIV medication where non-adherence can lead to the development of drug resistance. Within a cohort of HIV-infected illicit drug users receiving methadone maintenance therapy in a directly observed therapy (DOT) program for the treatment of HIV, we investigate a variety of issues related to adherence. Previous examinations of these issues were typically done using a cross sectional approach to the data analysis. This approach may be inefficient and prone to misclassification of exposure to periods of non-adherence when relating it to therapeutic outcomes. A longitudinal data analysis may be more efficient and robust against misclassification of periods of non-adherence. We use generalized estimating equations (GEE) to estimate model parameters under Gaussian, binomial and Poisson models. Lastly, my experience of working with a local small research group is commented on.
This is joint work with the PCHC Research Group (specifically: Dr. Brian Conway, Dr. Jason Grebely, Dr. Harout Tossonian and many others) and Dr. John Petkau.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 7th August 2007
3:45pm
Graduate Student,
Department of Statistics, UBC
Average Effects for Regression Models with Misspecifications and Diffuse Interaction Models
Show Abstract
In epidemiological studies, how best to assess and interpret interaction of risk factors of interest has been the subject of a lively debate. In statistical regression models, the interaction between two putative risk factors is described by the regression coefficient of the product of the risk factors. What happens if a linear regression model without pairwise interaction terms is used to fit the data actually generated from a linear regression model with all possible pairwise interactions? We apply the idea of average effect to evaluate the consequence of misspecified models and find out that the average effect estimates are still consistent if the joint distribution of risk factors satisfies some certain conditions. It is known that pairwise interaction models encounter intractable problems especially when the number of risk factors under consideration is quite large. The number of pairwise interaction terms is p(p-1)/2, if the number of risk factor is p. As an alternative strategy, we introduce diffuse interaction model with only one parameter to reflect the interactions among all the risk factors, without specifying which of the risk factors do indeed interact. We compare the two kinds of interaction models in terms of ability to detect interactions. Another issue investigated in the thesis is to devise MCMC algorithms to estimate diffuse interaction models. This is done not only for the diffuse interaction model assuming all risk factors interact in the same direction, either synergistically or antagonistically, but also for extended diffuse interaction models which relaxing this strong assumption.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 7th August 2007
3:00pm
Graduate Student,
Department of Statistics, UBC
Bayesian propensity score analysis for observational data
Show Abstract
In the analysis of epidemiological data, stratifying study subjects on the estimated propensity scores can reduce confounding bias from measured variables. Confidence intervals for the treatment effect are typically calculated without acknowledging uncertainty in the estimated propensity scores, and intuitively this may yield inferences which are falsely precise. We propose a Bayesian method which models the propensity score as a latent variable. Markov chain Monte Carlo is used for posterior simulation. We study the impact of modelling uncertainty in the propensity scores in a case-study investigating the effect of statin therapy on mortality in Ontario patients discharged from hospital following acute myocardial infarction. A feature of our method is that it fits regression models for the outcome variable and the propensity score simultaneously rather than one at a time. Using simulations we demonstrate that if the model for the outcome is correct, meaning that mortality risk is constant within subclasses of the propensity score, then our method permits more efficient estimation of the propensity scores. If the model is incorrect then performance will tend to deteriorate, although treatment effect estimation is largely unaffected. To empirically investigate the modelling assumptions for mortality risk in the case-study, we study predictive performance using cross-validation.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 2nd August 2007
3:00pm
Graduate Student,
Department of Statistics, UBC
Adaptive Likelihood Weights and Mixtures of Empirical Distributions
Show Abstract
Suppose that you must make inference about a population, but that data from m-1 similar populations are available. The weighted likelihood uses exponential weights to include all the available information into the inference. The contribution of each datum is discounted based on its dissimilarity with the target distribution.
One could hope to elicitate the likelihood weights from scientific information, but using data-based weights is more pragmatic. To this day, no entirely satisfactory method has been found for determining likelihood weights from the data.
We propose a way to determine the likelihood weights based on data. The suggested ``MAMSE'' weights are nonparametric and can be used as likelihood weights, or as mixing probabilities to define a mixture of empirical distributions. In both cases, using the MAMSE weights allows strength to be borrowed from the m-1 similar populations whose distribution may differ from the target.
The MAMSE weights are defined for different types of data: univariate, censored and multivariate. In addition to their role for the likelihood, the MAMSE weights are used to define a weighted Kaplan-Meier estimate of the survival distribution and weighted coefficients of correlation based on ranks. The maximum weighted pseudo-likelihood, a new method to fit a family of copulas, is also proposed.
All these examples of inference using the MAMSE weights are shown to be asymptotically unbiased. Furthermore, simulations show that inference based on MAMSE-weighted methods can perform better than their unweighted counterparts. Hence, the adaptive weights we propose successfully trade bias for precision.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 19th July 2007
4:00pm
Department of Statistics
University of Auckland, NZ
Response-selective sampling designs
Show Abstract
Sampling units for a regression analysis at least partially on the basis of the response variable is a form of biased sampling that can severely distort the apparent relationship between explanatory and response variables. And yet such sampling designs can be extraordinarily effective and efficient research tools. The simplest example of a response-selective design is the case-control study for a binary response variable. Here cases ( Y=1, e.g. people with a disease of interest) are sampled at a much higher rate than people without the disease ( Y=0, controls). If the condition under study is relatively rare, this is enormously more efficient than simply drawing a random sample from the population. Case-control studies are "perhaps the dominant form of analytical research in epidemiology". Latterly, there have been a plethora of related designs including case-supplemented designs, case-augmented designs, two- and multiphase case-control designs, case-cohort designs and nested case-control designs (in a survival analysis) context. We will discuss such sampling designs and methods of analysis.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 10th July 2007
4:00pm
Imperial College London
Chair in Biostatistics
Division of Epidemiology, Public Health and Primary Care
Bayesian analysis of gene expression data
Show Abstract
Microarray experiments and gene expression data have a number of characteristics that make
them attractive but challenging for Bayesian analysis. There are many sources of variability,
the variability is structured at different levels (array specific, gene specific, ....) and the
ratio of signal to noise is low. Typical experiments involve few samples but a large number
of genes, so that borrowing information, e.g. across genes, to improve inference becomes
essential. Hence embedding the inference in a hierarchical model formulation is natural.
Bayesian models adapted to the level of information processed have been developed to address
some of the questions raised that range from modelling the signal measured by GeneChip
arrays to synthesising gene lists across different experiments. In this talk, I shall discuss
and illustrate their use in variety of contexts: probe level models attempting to quantify
uncertainty of the signal, tail posterior probabilities for analysing pairwise and multiclass
data and the formulation of simple procedures for finding a list of features that are commonly
perturbed in two or more experiments.
Papers and technical reports at www.bgx.org.uk
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 17th May 2007
4:00pm
School of Public Health,
Yale University
CANCELLED Joint Modeling of Time Series Measures and Recurrent Events and Analysis of the Effects of Air Quality on Respiratory Symptoms
Show Abstract
Exposure to ambient pollutants at concentrations above defined standards is a risk factor for respiratory symptoms, especially in sensitive children. Many studies have been undertaken to monitor air quality and to assess its association with respiratory symptoms. We propose a joint mixed effects regression model of time series measures and recurrent events to analyze the air quality and respiratory symptom data from the Yale Mothers and Infants Health Study. Three mothers' symptoms (runny nose, cough, and sore throat) and three infants' symptoms (runny nose, cough, and general sickness) were investigated. To alleviate the computational complexity, a two-stage maximum likelihood based estimation procedure is introduced to estimate the parameters, and simulation studies are conducted to assess the validity of this estimation procedure. Our analysis reveals differences in the etiology of respiratory symptoms between mothers and infants. Most notably, coarse particles of mass between 2.5 and 10 microns in diameter increased the risks of mothers' runny nose and cough symptoms, but had no significant impact on any of the three infants' symptoms. The sulfate level was negatively associated with the risk of infants' runny nose and cough symptoms, but had no significant effects on any of the three mothers' symptoms. High level of humidity is negatively associated with the mothers' cough incidence, but had no significant association on any of the three infants' symptoms. Such differences reveal not only the sensitivity of the mothers and infants to the air quality, but also call for further understanding of the differences. It is possible that actions taken to overcome humidity by mothers may inadvertently affect the infants. This is a joint work with Yuanqing Ye, Peter Diggle, and Jian Shi.
Leonard S. Klinck 460, 6356 Agricultural Road, UBC
Wed 25th April 2007
3:30pm
L.J. Savage Professor of Statistics,
University of Michigan
The 2007-08 van Eeden Lecture: A Kiefer Wolfowitz Comparison Theorem For Wicksell’s Problem
Show Abstract
Social Lounge, St.John's College, 2111 Lower Mall, UBC
Tue 24th April 2007
4:00pm
L.J. Savage Professor of Statistics,
University of Michigan
The 2007-08 van Eeden Lecture: Shape Restricted Estimation In the Search for Dark Matter
Show Abstract
Images Theatre, SFU
Fri 20th April 2007
9:30pm
John Spinelli, Federico O'Reilly, John Petkau, Ted Anderson
Day 2 of 2.
PNW Meeting. Statistical Distributions and Models: Assessment and Applications
Show Abstract
Images Theatre, SFU
Thu 19th April 2007
1:00pm
David Brillinger, Louis-Paul Rivest, Jerry Lawless, Richard Lockhart
Day 1 of 2.
PNW Meeting. Statistical Distributions and Models: Assessment and Applications
Show Abstract
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 10th April 2007
4:00pm
Business School
Loughborough University
Cluster Analysis: The clustering-function-based method via sign eigenanalysis
Show Abstract
Consider a hierarchical divisive clustering problem with N objects measured on p variables. During each stage of the clustering, a selected group of n (n <= N) objects are to be divided into two sub-groups. Motivated by the MANOVA model and Fisher's linear discriminant analysis, this problem is formulated as a least squares optimization problem, simultaneously solving for both a vector of unknown group membership vector (having entries of 1 or -1) and a linear clustering function. It is shown that the optimal partition characterized by the group membership vector is a sign eigenvector of a modified hat matrix associated with the largest sign eigenvalue. The optimal partition can thus be attained via an algorithm for sign eigenanalysis. Two case studies involving gene expression data analysis are considered to illustrate the developed method.
Math Annex Room 1100, PIMS-UBC
Mon 2nd April 2007
3:00pm
University of Chicago
Statistical Models for Global Processes
Show Abstract
PIMS 10th Anniversary Speaker Series 2007
Location: Math Annex Room 1100
This talk explores some of the issues that arise in statistical modeling of atmospheric phenomena on a global scale, using total column ozone as measured by the satellite-based Total Ozone Mapping Spectrometer (TOMS) as a case study. A basic issue in all statistical models for natural phenomena is finding statistical regularities that enable one to take meaningful averages. Since the statistical characteristics of total column ozone strongly depend on latitude, we consider the use of axial symmetry (invariance of statistical properties to rotations about the Earth's axis) as a possible exploitable regularity. Methods for summarizing, modeling, estimating and visualizing spatial dependence for axially symmetric processes are addressed. A computationally convenient approach to modeling using truncated expansions of spherical polynomials is shown to capture much of the larger-scale latitudinal variation in spatial dependence. However, the approach performs disastrously in terms of describing the local behavior of the process, leaving a need for the development of statistical models that provide good descriptions of the data and computational methodologies that allow one to fit these models with reasonable degrees of statistical efficiency. Lessons learned from this only partially successful modeling effort, including suggestions for new data products based on TOMS, are described.
WMAX 110, UBC, PIMS-UBC
Mon 26th March 2007
4:00pm
Dept. Statistics, U. Toronto
The interface between Bayesian and frequentist statistics
Show Abstract
PIMS 10th Anniversary Speaker Series 2007
Location: WMAX 110
Notes: Coffee and refreshments will be served half an hour before the talk.
Abstract: Statistical theory is often categorized as either "Bayesian" or "frequentist", and statisticians often self-identify in the same categories. During the development of theoretical statistics as a separate field in the twentieth century this categorisation led to a great deal of discussion, some of which was surprisingly bitter and antagonistic. With the development of several key results in the asymptotic theory of inference based on the likelihood function, it is becoming clear that the mathematical differences between Bayesian and frequentist methods are rather less important than the philosophical ones. Some of this work is based on efforts to construct priors which minimize the difference between the two approaches and some is based on an ongoing effort to develop so-called 'reference', or 'objective' or ;default' priors. Perhaps not surprisingly, even the correct terminology to be used in this setting has been the subject of debate!
I will give an overview of some of the asymptotic theory behind the development of approaches to constructing priors that minimize the differences between Bayesian and frequentist inference, with special attention to 'strong matching' priors that have been developed recently in joint work with Don Fraser and colleagues. The construction of these priors provides some insight into the exact points of departure between Bayesian and frequentist methods, at least from the mathematical point of view. The philosophical debate may well continue for some time.
WMAX 110, UBC, PIMS-UBC
Mon 19th March 2007
4:00pm
Statistical and Applied Mathematics Institute, Duke University
Issues with Bayesian Analysis of Inverse Problems
Show Abstract
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 13th March 2007
4:00pm
Dr Bradley W. Vines
UC Davis Center for Mind and Brain
Applications of Functional Data Analysis in music cognition research
Show Abstract
Many aspects of mind and brain that interest psychologists involve continuous processes. For this reason, and due to the advancement of data collection technology, researchers often use measurements that are sampled over time and space (e.g., brain imaging, movement tracking, and continuous behavioral judgments). Such data present a challenge to traditional statistical techniques that make assumptions about the normal distribution and independence of collected data points. Correlations and regression analyses, for example, summarize the relations between entire data sets without the potential to reveal how those relations evolve over time. Functional Data Analysis (FDA, Ramsay & Silverman, 2002, 2005; Heckman, 2003) is ideal for analyzing data derived from continuous processes. These techniques model data as functions, and can be used to reveal the underlying dynamics that drive a set of measurements. Software and tutorials are available for free download to researchers who are interested in incorporating FDA tools into their analyses (
www.functionaldata.org). I will describe some applications of FDA in music cognition research, including smoothing, registration, functional principal components analysis, functional regression analysis, and functional significance testing. I will demonstrate these techniques using data from a study investigating multi-modal perception of musical performances.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 8th March 2007
4:00pm
University of Washington
Department of Biostatistics
Network discovery through timing maps
Show Abstract
Time-course microarray data consist of mRNA expression from a common set of genes collected at different time points. Such data are thought to reflect underlying biological processes developing over time. In this paper we propose a method to examine gene network relationships using time course microarray data. We assume that a sample of gene expression profiles is a realization of a process where each profile is modeled as a random functional transformation of a common curve. We propose measures of functional similarity and time order based on estimated time transformation functions. This allows for novel inferences on gene networks which takes full account of the timing of the functional features of the gene expression profiles. We discuss the application of our model to simulated data as well as to microarray data on prostate cancer progression.
This is joint work with D. Telesca, M. Neira, C. Nelson and M. Gleave.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 6th March 2007
4:00pm
Dept. of Earth & Ocean Sciences and Dept. of Physics & Astronomy
UBC
Extracting climate modes from noisy data
Show Abstract
With very noisy data, overfitting is a serious problem in pattern recognition. For nonlinear regression, having plentiful data eliminates overfitting, but for nonlinear principal component analysis (NLPCA), overfitting persists even with plentiful data. Thus simply minimizing the mean square error (MSE) is not a sufficient criterion for NLPCA to find good solutions in noisy data. A new information criterion is proposed which selects the NLPCA curve (computed using auto-associative neural networks) with the right amount of flexibility so it neither underfits nor overfits. This information criterion also automatically chooses between using an open or a closed curve fit for a dataset.
Nonlinear canonical correlation analysis (NLCCA) can also be performed using neural network models. A more robust version using the biweight midcorrelation instead of the Pearson correlation has been developed to work on noisy data.
These methods are applied to tropical Pacific and equatorial stratospheric climate data.
WMAX 110, UBC
Mon 19th February 2007
4:00pm
National Institute of Statistical Sciences
The Reality of Computer Models: Statistics and Virtual Science
Show Abstract
PIMS 10th Anniversary Speaker Series 2007Computer models are imperfect representations of real phenomena. An austere view is that validating a model cannot be done, the “primary value of models is heuristic: models are representations, useful for guiding further study but not susceptible to proof.” This view may have substantial basis in purely scientific roles, as distinct from a model’s use in policy and engineering contexts. But the real validation issue, we contend, is not whether a model is absolutely correct or only a useful
guide. Rather, it is to assess the degree to which it is an effective surrogate for reality: does the model provide predictions accurate enough for intended use?
Incisive argument on the validity of models, seen as assessment of their utility, has previously been hampered by the lack of a structure in which quantitative evaluation of a model’s performance can be addressed. The lack has given wide license to challenge computer model predictions (just what is the uncertainty in temperature predictions connected with increases in CO2?). A structure for validation should
• Permit clear cut statements on what and how performances are to be addressed and assessed
• Account for uncertainties stemming from a multiplicity of sources including field measurements and, especially, model inadequacies
• Recognize the confounding of calibration/tuning with model inadequacy - tuning can mask flaws in the model; flaws in the model may lead to incorrect values for calibration parameters
We will describe such a structure (and applications). It is built on methods and concepts for the statistical design and analysis of virtual experiments, drawing on elements of Gaussian stochastic processes and Bayesian analysis.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 8th February 2007
4:00pm
Dr. Wei Liu
Department of Statistics, UBC
Semiparametric Nonlinear Mixed-effects Models with Covariate Measurement Errors and Missing Responses
Show Abstract
Semiparametric nonlinear mixed-effects (NLME) models are flexible for modeling complex longitudinal data. Covariates are usually introduced in the models to partially explain inter-individual variations. Some covariates, however, may be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. We propose two approximate likelihood methods for semiparametric NLME models with covariate measurement errors and nonignorable missing responses. The methods are illustrated in a real data example. Simulation results show that both methods perform well and are much better than the commonly used naive method.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 6th February 2007
4:00pm
University of British Columbia
Department of Statistics
Extended Bayesian Information Criteria for Model Selection with Large Model Space
Show Abstract
It has been observed that the ordinary Bayes information criterion is too liberal for model selection when the model space is large. In this presentation, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. Unlike the original Bayes information criterion, which balances the log likelihood by a penalty on the number of unknown parameters, the extended Bayes information criteria take into account both the number of unknown parameters and the complexity of the model space. The consistency of the extended Bayes information criteria is established. Their performance in various situations is evaluated by simulation studies. They are compared with the original Bayes information criterion in terms of positive selection rate and false discovery rate in problems of variable selection. It is demonstrated that the extended Bayes information criteria incurs a little loss in positive selection rate but tightly controls false discovery rate, a desirable property in many applications. The extended Bayes information criteria are extremely useful for variable selection in problems with moderate sample size but huge number of covariates, especially, in genome-wide association studies which is now a hot area in genetics research. We further developed some results which may have their own significance.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 25th January 2007
4:00pm
Professor of Psychology, McGill University
Adjunct Professor of Statistics, UBC
Estimating the Quantile Function
Show Abstract
With: Giles Hooker, Assistant Professor of Statistical Science & Biological Statistics/Computational Biology, Cornell University
The quantile function Q(u) is the inverse of the probability density function F(x); that is, Q[F(x)] = x and F[Q(u)] = u. John Tukey championed its use, point out that ordinary folks often present us with a probability u and want to know the event x that is associated with it, rather than with an event whose probability they don't know. Our particular interest is providing helpful information about rainfall on the Canadian prairies, and we want to be able to tell a producer about extremes of precipitation that they will only see, for example, once in a century. We will review the quantile function and its many interesting properties.
Emanuel Parzen and many others have discussed the problem of estimating Q from a sample of data. The definition of a strictly monotone function developed by Ramsay (JRSS-B, 1996) leads to an especially neat formulation of this estimation problem, and to some new approaches. In particular, we are working on the problem of estimating a distributed quantile function Q(u,t,r) where t indexes time and r indexes space. This generalizes the usual data smoothing problem, which only attempts to estimate the expectation of x, and quantile regression, which estimates a single quantile value.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 16th January 2007
4:00pm
CSIRO Mathematical and Information Sciences,
Brisbane, Australia
Statistics Down Under: the lay of the research land in CSIRO Mathematical and Information Sciences
Show Abstract
This is a very general talk aimed at providing insight into statistical research undertaken in CSIRO Mathematical and Information Sciences. I will present an overview of the division before focussing on the environmental statistics theme within the division to highlight what kinds of research problems we tackle and how we go about doing that. The other 3 divisional themes also have statistics in their capability grouping. I’ll conclude by briefly discussing one of my current key projects, namely redesigning the state of Queensland’s ambient aquatic ecosystem monitoring program.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 11th January 2007
4:00pm
Prof Simon Peacock
Dean, Faculty of Science, UBC
Meet the New Dean of the Faculty of Science
Show Abstract
Dr Simon Peacock joined UBC as the Dean of the Faculty of Science on September 1, 2006. This meeting will allow members of the Department and the Dean to get to know each other better. After some introductory remarks, there will be opportunity for open discussion.
Dr Peacock obtained his PhD from UCLA. Since then, his academic career at Arizona State University has included the posts of Chair of the Department of Geological Sciences, Interim Associate Dean for Academic Personnel, and Divisional Dean of Natural Sciences and Mathematics for the College of Liberal Arts and Sciences.
Dr Peacock has an extensive research record in the earth sciences. On this occasion, however, the meeting will not be technical, so please bring along your questions about the issues facing the Department of Statistics, the Faculty of Science, and UBC.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th January 2007
4:00pm
Department of Biostatistics
Columbia University
Time-dependent Bivariate Growth Charts
Show Abstract
Growth charts have been widely used in clinics and medical centers to monitor an individual subject¢s growth status in context of population values. Typical growth charts consider only one measurement at a time, although it is well recognized that more informative readings can be obtained by considering multiple measurements simultaneously. We propose to construct bivariate growth charts by a nested sequence of time-dependent reference quantile contours on the joint distribution of the bivariate measurements. A two stage method based upon quantile regression is proposed to estimate such time-dependent bivariate growth charts from reference data with possibly irregular measurement times. The method is also flexible to include, whenever necessary, the other potentially important covariates, such as family history. The performance of the propose methodology was demonstrated by a Monte-Carlo simulation study, as well as an application to height-weight screening of young children in the United States.