Seminars
BRG
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Thu 6th December 2012
4:00pm
Statistics
Earth Sciences Bldg, Room 4192, 2207 Main Mall, UBC
Tue 27th November 2012
11:00am
Department of Economics
UBC
Local Asymptotic Minimax Estimation of Nonregular Parameters with Translation-Scale Equivariant Maps
Show Abstract
When a parameter of interest is de…fined to be a nondifferentiable transform
of a regular parameter, the parameter does not have an infl‡uence function, rendering
the existing theory of semiparametric efficient estimation inapplicable. However, when
the nondifferentiable transform is a known composite map of a continuous piecewise
linear map with a single kink point and a translation-scale equivariant map, this paper
demonstrates that it is possible to defi…ne a notion of asymptotic optimality of an esti-
mator as an extension of the classical local asymptotic minimax estimation. This paper
establishes a local asymptotic risk bound and proposes a general method to construct
a local asymptotic minimax decision.
The paper is found from the website:
http://faculty.arts.ubc.ca/ksong/onpE11.pdf
BRG
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Thu 22nd November 2012
4:00pm
Department of Statistical & Actuarial Sciences
University of Western Ontario
Evaluating diagnostic tests for Chlamydia trachomatis in the absence of a gold standard
Show Abstract
Evaluation of new diagnostic tests in the absence of a gold-standard test
is challenging. Three approaches that have been used for this purpose include:
1) the patient infected status algorithm, 2) the composite reference standard
approach, and 3) traditional latent class models. In addition, we propose
hierarchical latent class models that recognize that tests based on different
biological mechanisms in fact measure different latent variables, which in turn
measure the latent true disease status. Using simulations we illustrate the
drawbacks of the ad-hoc approaches and the advantages of our statistical models.
We apply these models to evaluate screening tests for detecting Chlamydia
trachomatis.
Statistics
Earth Sciences Bldg, Room 4192, 2207 Main Mall, UBC
Tue 20th November 2012
11:00am
Department of Statistics
UBC
Properties and Applications of the Adjusted Empirical Likelihood
Show Abstract
Empirical likelihood (EL) is a very useful tool for statistical inference and it has been successfully applied to statistical problems arisen from many areas. Inferences based on EL have surprisingly simple asymptotic properties and many desirable finite sample properties. The method is particularly effective at making use of model information through estimating functions. There is, however, an asymptotically negligible issue that can pose a serious challenge in applications. Particularly when a model is over-specified, the estimating functions may not have a solution and the EL is then undefined. Various remedies have been proposed to overcome this difficulty. We propose a simple adjustment to the empirical likelihood (AEL) which makes it well defined at all parameter values and retain first order asymptotic properties. In addition, the AEL possesses many desirable finite sample properties for constructing confidence regions of a population mean. It enables the EL to cover extra grounds in statistics. Furthermore, by tuning the level of adjustment, the AEL can obtain confidence regions with a high-order coverage precision.
Statistics
Earth Sciences Bldg, Room 4192, 2207 Main Mall, UBC
Tue 13th November 2012
11:00am
Visiting Professor, UBC.
School of Mathematics & Statistics
Zhejiang University of Finance & Economics
Moving blocks empirical likelihood method in longitudinal data analysis
Show Abstract
In longitudinal or panel data analysis, neglecting the serial correlations within subject could result in the inefficient estimation of the parameters. Some of the panel data sets, like the Penn World Table (PWT), have appreciable time series dimension as well as a large cross section dimension. When time series dimension is large, there is a need to consider serial correlations more generally. To accommodate the serial correlations within subject nonprametrically, we propose a moving blocks empirical likelihood method for general estimating equation. Asymptotic results are derived under sequential limit. Simulation studies are conducted to investigate the finite sample properties of the proposed methods and compare them with the elementwise and subject-wise empirical likelihood method of Wang et al. (2010) and block empirical likelihood method of You et al. (2006). An application to an AIDS longitudinal study is presented.
Statistics
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Tue 6th November 2012
11:00am
Department of Statistics,
UBC
A Bayesian extreme value analysis of debris flows
Show Abstract
Debris flows carry a tremendous potential for physical destruction as well as a threat to human lives.
Quantitative analysis of their frequency and magnitude relation is key to the development of mitigation
measures to reduce debris flows hazard. Yet, the data available for such analysis are typically very
scarce, leading to point estimates for the return levels which are too imprecise to be of practical
value. Within a Bayesian framework for extreme value analysis, we demonstrate how additional sources
of information, in particular such as the expert's judgement, can be incorporated to produce more precise
estimates. We provide a rational for the prior choice and discuss how its parameters can be elicited from
the expert's knowledge. A case study of the debris flows at Capricorn Creek in Western Canada is used to
illustrate our methodology.
BRG
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Thu 1st November 2012
4:00pm
Journal Club
Statistics
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC.
Tue 30th October 2012
11:00am
Department of Economics
UBC
Testing the Number of Components in Finite Mixture
Show Abstract
This paper considers likelihood-based testing of the null hypothesis of m0 components against
the alternative of m0 + 1 components in a nite mixture model. The number of components is
an important parameter in the applications of nite mixture models. Still, testing the number
of components has been a long-standing challenging problem because of its non-regularity.
We develop a framework that facilitates the analysis of the likelihood function of nite
mixture models and derive the asymptotic distribution of the likelihood ratio test statistic for
testing the null hypothesis of m0 components against the alternative of m0 + 1 components.
Furthermore, building on this framework, we propose a likelihood-based testing procedure of
the number of components. The proposed test, extending the EM approach of Li et al. (2009),
does not use a penalty term and is implementable even when the likelihood ratio test is dicult
to implement because of non-regularity and computational complexity.
Statistics
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Tue 23rd October 2012
11:00am
Department of Physics & Astronomy
UBC
Cosmology (for statisticians)
Show Abstract
Our modern understanding of the Universe on the largest scales will be described, including the importance of several statistical tools for reaching that understanding.
Statistics
Room 4192 Earth Sciences Building, 2207 Main Mall, UBC.
Tue 16th October 2012
11:00am
Queensland Facility for Advanced Bioinformatics
University of Queensland
Australia
Multivariate analysis for ‘omics’ data exploration and integration
Show Abstract
Recent advances in high throughput ‘omics’ technologies enable quantitative measurements
of expression or abundance of biological molecules of a whole biological system. The transcriptome,
proteome and metabolome are dynamic entities, with the presence, abundance and
function of each transcript, protein and metabolite being critically dependent on its temporal
and spatial location.
Whilst single omics analyses are commonly performed to detect between-groups difference from
either static or dynamic experiments, the integration or combination of multi-layer information
is required to fully unravel the complexities of a biological system. Data integration relies on
the currently accepted biological assumption that each functional level is related to each other.
Therefore, considering all the biological entities (transcripts, proteins, metabolites) as part of
a whole biological system is crucial to unravel the complexity of living organisms.
We are currently establishing a global analytical framework to extract relevant information
from high throughput ‘omics’ platforms such as genomics, proteomics, metabolomics and other
types of biological data. Specifically, the statistical methodologies that we have developed focus
on the so-called multivariate projection-based approaches, which can handle such large data
sets, deal with multicollinearity and missing values. These methodologies enable dimension
reduction by projecting these large data sets into a smaller subspace to capture the largest
sources of variation in the biological studies. These techniques enable exploration, visualisation
of the data and lead to biological insights.
Along with the integrative methodologies we have recently developped, a main focus in our
work has been to propose insightful graphical outputs to help the interpretation of the results.
I will introduce the application of such methodologies to a kidney transplant study from the
Prevention of Organ Failure Centre of Excellence and show how these integrative analyses of
large scale omics datasets can generate new knowledge not accessible by the analysis of a single
data type alone. I will also present the current developments made for longitudinal analyses
on that same study.
Statistics
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Tue 9th October 2012
11:00am
Department of Statistics & Actuarial Science
Simon Fraser University
Model Misspecification in Statistical Analysis
Show Abstract
In general, model misspecification can lead to invalid inference for parameter estimation and risk prediction. In the context of quasi-likelihood inference, most of the existing statistical methods primarily focus on assessing the validity of the mean structure. However, limited work addresses the adequacy of the variance/covariance (Var/Cov) structure, and more specifically, there lacks a powerful systematic statistical test for model misspecification in Var/Cov. In this talk, I will introduce a novel and unified framework for testing such misspecification. Our method shows substantial improvement and is more robust in comparison to several popular existing statistical methods.
In the context of risk prediction, I will talk about some challenges that arise in the evaluation of the incremental value in prediction accuracy by adding new biomarkers. In light of these challenges, we have proposed novel statistical procedures for systematically identifying potential subgroups that can benefit from the measurement of additional markers. Notably, our method is robust against possible model misspecification. Finally, I will discuss developing and evaluating absolute risk prediction models with newly identified biomarkers under nested case-control sampling design; here, measurement of biomarkers on the whole study population is neither feasible nor cost-effective.
BRG
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Thu 4th October 2012
4:00pm
Journal Club
Statistics
Earth Science Bldg, Room 4192, 2207 Main Mall, UBC
Tue 2nd October 2012
11:00am
Department of Statistics, UBC
Motivating statistical inference through the use of student-collected data and examples that pertain to students' lives
Show Abstract
Statistical inference is an important topic in introductory statistics
courses and yet very challenging to teach. The concept of sampling
distribution is notoriously difficult for students to comprehend. In
order to motivate statistical inference and improve student conceptual
understanding in an introductory statistics course, we created an in-class
activity and assignments which involved students in the data collection
process. Lecture examples were developed based on data collected by
students and scenarios that pertain to students' lives. We also advocate
frequent use of graphical tools to help students visualize abstract
concepts such as repeated sampling and sampling distributions. Student
feedback indicated that the activities and examples are engaging and
useful to their learning.
Statistics
Room 4192, Earth Sciences Bldg, 2207 Main Mall, UBC.
Tue 25th September 2012
11:00am
PhD Candidate
Department of Statistics, UBC
Ensembling Classification Models Based on Phalanxes of Variables with Applications in Drug Discovery
Show Abstract
Statistical detection of relevant rare class observations in a highly unbalanced two class situation is an interesting and challenging problem. In this work, we are interested in detecting rare chemical compounds that are active against a biological target, such as active against lung cancer tumor cells, as part of a drug discovery process. Unlike predicting the classes of the compounds, we rank all of the compounds in terms of probability of activity to produce a shortlist so that the maximum number of actives can be found in the very beginning of the shortlist. We have used four assay datasets and five rich -- in terms of number of variables -- descriptor sets for each of the four assays. Capitalizing on the richness of variables in a descriptor set, we form the phalanxes by grouping variables together. The variables in a phalanx are good to put together, whereas the variables in different phalanxes are good to ensemble. We then form our ensemble by growing a random forest in each phalanx and aggregating them over the phalanxes. The performance of the ensemble of phalanxes is found to be better than its competitors random forest and regularized random forest. Our ensemble performs very well when there are many variables in a descriptor set and when the proportion of active compounds is very small. In other words, the harder the problem is the better the ensemble of phalanxes performs relative to the other two ensembles.
Statistics
WMAX110, PIMS, 1933 West Mall, UBC (please note location)
Tue 21st August 2012
11:00am
Meijiao Guan
MSc Candidate
Department of Statistics
UBC
Incorportating Prior information into an Approach for Detecting Unusually Large Increase In MRI Activity in Mutiple Sclerosis Patients
Show Abstract
An increase of contrast enhancing lesions (CELs) on repeated magnetic resonance imaging (MRI) has been used as a safety indicator in multiple sclerosis (MS) clinical trials. A probability based procedure was proposed to quantity the likelihood of observing a value as large as on the new scan.
CELs counts from MRI images are reviewed by the Data Safety and Monitoring Board on a regular basis. At each DSMB review, the number of patients enrolled in the study could vary and the numbers of scans available across patients also vary. In particular, little data would typically be available in the early stages of the trial. Therefore, a Bayesian approach may be more reliable than a frequentist approach.
In this project, the negative binomial random effects model is fit to MS data. The results from previous similar clinical trials provide informative priors. Markov Chain Monte Carlo algorithms allow us to draw samples approximately from the joint posterior distribution. The inference on the conditional probability was both carried out by the Bayesian and the frequentist approaches. After applying these methods to simulated dataset and an actual data in MS study, we conclude that the Bayesian approach is more effective in indentify patients who has extreme lesion activity on the new scan.
Statistics
LSK 462, Leonard S. Klinck Bldg., 6356 Agricultural Road, UBC
Thu 16th August 2012
11:00am
Hongyang (Fred) Zhang
MSc Candidate
Department of Statistics
UBC
Linear Model Selection Based on Extended Robust Least Angle Regression
Show Abstract
In variable selection problems, when the number of candidate covariates is relatively large, the "two-step" model building strategy, which consists of two consecutive steps sequencing and segmentation, is often used. Sequencing aims to first sequence all the candidate covariates to form a list of candidate variables in which more "important" ones are likely to appear at the beginning. Then, in the segmentation step, the subsets of the first m (chosen by the user) candidate covariates which are ranked at the top of the sequenced list will be carefully examined in order to select the final prediction model. This thesis mainly focuses on the sequencing step.
Least Angle Regression (LARS), proposed by Efron, Hastie, Johnstone and Tibshirani (2004), is a quite powerful step-by-step algorithm which can be used to sequence the candidate covariates in order of their importance. Khan, J.A., Van Aelst, S., and Zamar, R.H. (2007) further proposed its robust version --- Robust LARS. Robust LARS is robust against outliers and computationally efficiency. However, neither the original LARS nor the Robust LARS is available for carrying out the sequencing step when the candidate covariates contain both continuous and nominal variables. In order to remedy this, we propose the Extended Robust LARS by proposing the generalized definitions of correlations which includes the correlations between nominal variables and continuous variables. Simulations and real examples are used to show that the Extended Robust LARS gives superior performance to two of its competitors, the classical Forward Selection and Group Lasso.
Statistics
LSK 462, Leonard S. Klinck Bldg., 6356 Agricultural Road, UBC
Thu 16th August 2012
11:00am
MSc Candidate
Department of Statistics
UBC
Two-Step and Likelihood Methods for Joint Models
Show Abstract
Survival data often arise in longitudinal studies, and the survival process and the longitudinal process may be related to each other. Thus, it is desirable to jointly model the survival process and the longitudinal process to avoid possible biased and inefficient inferences from separate inferences. We consider mixed effects models (LME, GLMM, and NLME models) for the longitudinal process, and Cox models and accelerated failure time (AFT) models for the survival process. The survival model and the longitudinal model are linked through shared parameters or unobserved variables. We consider joint likelihood method and two-step methods to make joint inference for the survival model and the longitudinal model. We have proposed linear approximation methods to joint models with GLMM and NLME submodels to reduce computation burden and use existing software. Simulation studies are conducted to evaluate the performances of the joint likelihood method and two-step methods. It is concluded that the joint likelihood method outperforms the two-step methods.
Statistics
LSK 460, Leonard S. Klinck Bldg., 6356 Agricultural Road, UBC.
Tue 14th August 2012
11:00am
MSc Candidate
Department of Statistics
UBC
Lower Quantile Estimation of Wood Strength Data
Show Abstract
In wood engineering, the lower quantile estimation is vital to the safety of the construction with wood materials. In this presentation, we will first study the censored Weibull maximum likelihood estimate (MLE) of lower quantile as in the current industrial standard D5457 (ASTM, 2004a) from a statistical point of view. According to our simulations, the lower quantile estimated by the censored Weibull MLE with 10% empirical quantile as the threshold has smaller mean squared error (MSE) than the intuitive parametric or non-parametric quantile estimate. This advantage can be shown to be achieved by a good balance between the variance and bias with the help of the subjective censorship.
However, the standard D5457 (ASTM, 2004a) only utilizes a small (10%) and ad-hoc proportion of the data in the lower quantile estimation, which stimulates us to further improve it. First, we can consider fitting a more complex model, such as the Weibull mixture, to a larger, (e.g., 70%) proportion of the data set with the subjective censorship, which leads to the censored Weibull mixture estimate of the lower quantile. Also, bootstrap can be used to select a better censoring threshold for the censored Weibull MLE, which leads to the bootstrap censored Weibull MLE. According to our simulations, both proposals can yield better lower quantile estimate than the standard D5457 and the bootstrap censored Weibull MLE is better than the censored Weibull mixture.
Statistics
WMAX110, PIMS, 1933 West Mall, UBC (please note location and time)
Thu 2nd August 2012
2:00pm
Yuqing Wu
MSc Candidate
Department of Statistics
UBC
The Rocky Sleep Study and Multiple Imputation
Show Abstract
The Rocky Sleep Study is a behavior clinical trial about improving young babies' poor night sleep. I have been involved in this clinical study for over 1 year as the statistician. So in this presentation I will explain the Rocky Sleep Study, and what I have done with this study in the last year. I will show the basic ideas of Rocky Sleep Study as well as the analysis method and result. Moreover, I will explain the Multiple Imputation method, how it developed and how it can be used to deal with missing data. Last, I will show how to use R to run the Multiple Imputation Method.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 19th July 2012
4:00pm
University Professor Emerita
Department of Statistics and Actuarial Science
University of Waterloo
The Canadian Statistical Institute: a new resource for statistical sciences
Show Abstract
On June 4, 2012, the Statistical Society of Canada approved the formation of the Canadian
Statistical Institute (CSI), and a committee of SSC Presidents (past, present and future) is in
the process of setting up the initial Board of the CSI. The talk will describe briefly the vision
of the CSI put forward by the CSI Development Committee earlier this year, and steps which
will have to be taken in the first few months of the CSI to make it operational. Comments and
ideas from the audience will be invited.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Rd, UBC
Tue 17th July 2012
11:00am
Ardavan Saeedi
MSc Candidate
Department of Statistics, UBC
Nonparametric Bayesian Models for Markov Jump Processes
Show Abstract
Modelling the time evolution of a dynamical system is required in many different fields ranging from predicting disease progression in multiple sclerosis (MS) to modelling the RNA evolution, and analyzing communication networks. Among the simplest models for continuous-time dynamical systems are the Markov jump processes (MJPs), continuous-time Markov processes with piecewise constant paths. However, in complex settings, such as disease progression in MS, it is seldom the case that one has a defensible parametric MJP at their disposal.
Nonparametric priors have recently attracted much attention in the statistics community, due to their flexibility, adaptability, usefulness in analyzing complex real world datasets, and their ability to sidestep the model selection. We propose a nonparametric prior over MJPs. Particularly, we propose a prior over infinite rate matrices which characterize an MJP. These priors can be used in Bayesian models where an MJP is imposed on the data but the number of states of the MJP is unknown in advance.
A challenge toward using these models is the problem of inference. We propose a Particle Markov chain Monte Carlo (PMCMC) algorithm, an MCMC with sequences proposed by a sequential Monte Carlo (SMC) algorithm, to carry the inference in these models. We introduce and compare two SMC proposals for problems in different fields including estimating disease progression, modelling RNA evolution.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 28th June 2012
11:00am
MSc Candidate
Department of Statistics
UBC
A study of the relationship between CBOE Volatility Index (VIX) and realized volatility
Show Abstract
The relationship between implied volatility and realized volatility of
financial return series has long been of interest to financial
econometricians. In this project, I focus on a special measure of
implied S&P 500 return volatility--VIX. Since it is introduced by CBOE
in 1993, VIX has been considered by many to be the world's premier
barometer of investor sentiment and market volatility. I first examine
the biasedness of the implied volatility as a predictor for near-term future
realized volatility by focusing on a Mincer-Zarnowitz style
regression. Then I use GARCH type model and ARFIMA model as base
models to test whether VIX contains incremental information beyond the
realized volatility. Including VIX as an exogenous predictor in these
models, I conclude that the implied volatility will help improve the
likelihood significantly and has a better out-of-sample predictive
performance. In addition, I also find VIX is related to long-term S&P
500 index level which might be interesting to stock market investors.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 20th June 2012
11:00am
Hao (Allen) Luo
MSc Candidate
Department of Statistics
UBC
Costs and Benefits of Environmental Data in Investigations of Gene-Disease Associations
Show Abstract
The inclusion of environmental exposure data may be beneficial, in terms of statistical power, to investigation of gene-disease association when gene-environment interaction exists. However, resources invested in obtaining exposure data could instead be applied to measure disease status and genotype on more subjects. In a cohort study setting, we consider the tradeoff between measuring only disease status and genotype for a larger study sample and measuring disease status, genotype, and environmental exposure for a smaller study sample, under the `Mendelian randomization’ assumption that the environmental exposure is independent of genotype in the study population. We focus on the power of tests for gene-disease association applied in situations where a gene modifies risk of disease due to particular exposure without a main effect of gene on disease. Our results are equally applicable to exploratory genome-wide association studies and more hypothesis-driven candidate gene investigations. We further consider the impact of the misclassification for environmental exposures. We find that under a wide range of circumstances research resources should be allocated to genotyping larger groups of individuals, to achieve a higher power for detecting presence of gene-environment interactions by studying gene-disease association.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 20th June 2012
11:00am
Dongxu Wang
MSc Candidate
Department of Statistics
UBC
Topics on the Effect of Non-differential Exposure Misclassification
Show Abstract
There is quite an extensive literature on the deleterious impact of exposure misclassification when inferring exposure-disease associations, and on statistical methods to mitigate this impact. When the exposure is a continuous variable or a binary variable, a general mismeasurement phenomenon is attenuation in the strength of the relationship between exposure and outcome. However, few have investigated the effect of misclassification on a polychotomous variable. Using Bayesian methods, I investigate how misclassification affects the exposure-disease associations under different settings of classification matrix. Also, I apply a trend test and understand the effect of misclassification according to the power of the test. Moreover, since virtually all of work on the impact of exposure misclassification presumes the simplest situation where both the true status and the classified status are binary, my work diverges from the norm, in considering classification into three states when the actual exposure status is simply binary. Intuitively, the classification states might be labeled as 'unlikely exposed,' 'maybe exposed,' and 'likely exposed.' While this situation has been discussed informally in the literature, I provide some theory concerning what can be learned about the exposure-disease relationship, under various assumptions about the classification scheme. I focus on the challenging situation whereby no validation data is available from which to infer classification probabilities, but some prior assertions about these probabilities might be justified.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 19th June 2012
11:00am
PhD Candidate
Department of Statistics
UBC
Sparse MLE for Variable Screening in Ultra-High Dimensional Regressions
Show Abstract
Variable selection plays a pivotal role in modeling the high dimensional data which nowadays appears in many areas of scientific research. Developing an efficient screening procedure to quickly reduce the number of candidate variables is essential in such analyses. Motivated from the seminal theory of sure independence screening (SIS; Fan and Lv (2008)), we propose a novel screening approach via the sparsity-restricted maximum likelihood estimator, namely SMLE. The SMLE estimates the high dimensional model coefficients in a designated low-dimensional subspace and screens the irrelevant variables by setting their corresponding coefficients at zero.
The variables passed from the SMLE are then subject to more elaborated selection through the popular regularization methods (e.g. LASSO and SCAD). Compared with the SIS, which screens features based on the marginal correlations, the new method accounts for some joint effects between candidate variables and thus can be more reliable in applications. We establish the consistency of proposed method in the context of (ultra) high dimensional generalized linear models and further develop an efficient algorithm for its implementation. The decent performances of the new method have been observed in the numerical studies.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 12th June 2012
11:00am
PhD Candidate
Department of Statistics, UBC
Bayesian Phylogenetic Inference via Monte Carlo Methods
Show Abstract
A main task in evolutionary biology is phylogenetic tree
reconstruction which determines the ancestral relationships among
different species based on observed molecular sequences, e.g. DNA
data. When a stochastic model, typically Continuous
Time Markov Chain (CTMC), is used to describe the evolution, the
phylogenetic inference depends on unknown evolutionary parameters
(hyper-parameters). Bayesian inference provides a general framework
for phylogenetic analysis, able to implement complex models of
sequence evolution and to provide a coherent treatment of uncertainty
for the groups on the tree. However, the conventional computational
methods in Bayesian phylogenetics based on Markov chain Monte Carlo
(MCMC) cannot efficiently explore the huge tree space. We propose the
Combinatorial Sequential Monte Carlo (CSMC) method to generalize
applications of Sequential Monte Carlo (SMC) to non-clock tree
inference based on the existence of a flexible partially ordered set
(poset) structure. We show that the proposed CSMC algorithm is
consistent and fast in simulations. We also investigate two ways of
combining SMC and MCMC to jointly estimate the phylogenetic trees and
evolutionary parameters, particle Markov chain Monte Carlo (PMCMC)
algorithms with CSMC at each iteration and an SMC sampler with MCMC
moves. In this talk, these proposed methods will be demonstrated
using ribosomal RNA sequences of Chloroplast and DNA sequences of
Cichlid fishes.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 25th May 2012
11:00am
Department of Statistics
UBC
Strategies for Using Clickers in Undergraduate Statistics Teaching
Show Abstract
Session will be 11:00 am to 12:30 pm.
On Friday 25th May, I will host a workshop in LSK 301 on the use of clickers in undergraduate Statistics teaching. The session should be helpful to anyone interested in the teaching and learning of Statistics, whether or not they have experience of using clickers as a student or an instructor. The workshop will be hands-on - so yes, we will be using clickers - and is provisionally scheduled to conclude at 12.30pm meaning there should be ample time to share and discuss ideas.
The session will review some of the different ways clickers can be used in the teaching of Statistics, and also offer some handy tips for how to make best use of clickers in a lecture. If you have any examples that you have tried or seen that you would like me to consider for inclusion, please let me know by Wednesday, 23rd May. Better still, if you'd like to talk about your use of clickers and run through an example, I'd be happy to involve you in the session.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 24th May 2012
4:00pm
Department of Statistics
Department of Computer Science
Purdue University
Protein quantification from mass spectra: statistical methods and tools for overcoming sparsity and variation
Show Abstract
Mass spectrometry-based proteomics quantifies and compares the abundances of proteins in complex
biological mixtures. It enables a global profiling of the relatively abundant proteins with liquid
chromatography coupled with tandem mass spectrometry (LC-MS/MS), and a targeted profiling of lower-
abundant proteins with selected reaction monitoring (SRM). However most experiments only quantify
a small subset of the proteome (typically between 50 and 600 proteins). Moreover, these experiments
do not quantify the proteins directly, but output a list of spectral features that are
subject to numerous sources of variation. The sparse nature of protein quantification and the
stochastic variation undermine our ability to make reliable and biologically relevant conclusions.
The goal of our work is to develop statistical methodology and bioinformatics tools to (i) accurately
quantify the abundance of the proteins based on the spectral features in LC-MS and SRM experiments,
(ii) propose experimental designs which reduce the cost and the time of the experiments without
compromising the accuracy, and (iii) maximize the biologically relevant interpretation from the
sparse lists of quantified proteins. This talk will describe statistical methods and software that
we have recently developed for these purposes.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 15th May 2012
11:00am
Lehrstuhl fuer Mathematische Statistik
Technische Universitaet Muenchen
Dependence modeling with vine copulas
Show Abstract
Flexible multivariate statistical dependence models are needed for many
data structures. While the popular multivariate normal distribution is
very restrictive and cannot account for features like asymmetry and
heavy tails, copulas can be used to build more flexible models.
Exploiting the famous theorem by Sklar which allows to separate the
dependence structure from the marginal distributions many successful
models have been developed in recent years. Much of this research
however is limited to the bivariate case, where numerous copulas are
available. This is unlike the multivariate case, where standard
multivariate copulas are rather restrictive in their structure. Vine
copulas do not suffer from such shortcomings and can be conveniently
constructed using only bivariate copulas as building blocks. In this
talk I introduce the concept of vine copulas and discuss appropriate
statistical inference techniques. This in particular includes issues of
model selection, which may be challenging in higher dimensions. As an
application I consider weather measurements of different variables like
temperature, humidity and pressure observed at Hohenpeissenberg, the
oldest mountain weather station in the world. Finally, I give an outlook
how such models may be extended to data from multiple stations using a
hierarchical copula construction.
Joint work with Michael Pachali, Claudia Czado, and Christian Zang.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 8th May 2012
11:00am
The Master at the Royal Mint: How much money did Newton save Britain?
Show Abstract
From the extant statistical data, this paper reconstructs several episodes in the history of the Royal Mint during Isaac Newton’s tenure. We discuss four types of uncertainty embedded in the production of coins, extending S. Stigler’s work (1977) back in time. The thirteen Jury Verdicts in Trials of the Pyx for 1696-1727 allow judgment on the impartiality of the Jury at the trials. The Verdicts, together with several remarks by Newton in his correspondence with the Treasury, allow us to estimate the standard deviation σ in weights of individual guineas coined before and during Newton’s Mastership. This parameter, in turn, permits us to estimate the amount of money Newton saved Britain after he put a stop to the illegal practice by goldsmiths and bankers of culling heavy guineas and recoining them to their advantage; a conservative estimate for savings to the Crown is £41,510, and possibly three times as much. The procedure with which he likely improved coinage gives historical insight on how important statistical notions – standard deviation and sampling -- came to the forefront in practical matters: the former as a measure of variation of weights of coins, and the latter as a test of several coins to evaluate the quality of the entire population. Newton can be credited with the formal introduction of testing a small sample of coins, a pound in weight, in the trials of the Pyx from 1707 onwards, effectively reducing the size of admissible error. Even Newton’s “Cooling Law” could have been contrived for the purpose of reducing variation in the weight of coins during initial stages of the minting process. Three open questions are posed in the Summary.
Key words: Isaac Newton, Royal Mint, Trial of the Pyx, Jury Verdicts, guinea, remedy, margin in weight, Gaussian distribution, mean and standard deviation, small samples
Statistics
Room 212, Geography Bldg, 1984 West Mall, UBC
Tue 1st May 2012
11:00am
Senior Research Professor
Department of Statistics
University of Leeds, UK
New Non-Euclidean Statistical Methods and Modern Life-Sciences
Show Abstract
If the last century in Science belongs to Physical Sciences then this century must belong to Life Sciences with many breakthroughs arising from DNA and proteins! The proteins are biological macromolecules that are of primary importance to all living organisms and there are various open problems including the Nobel-Prize-type problem related to protein folding. All these questions mainly depend on shape of the protein in 3-D which can be summarized in terms of either the configuration of points (landmarks) or more compactly by conformational angles. Thus it has led to new non- Euclidean statistical methods in shape analysis and directional data analysis. We will discuss the following topics with appropriate motivation:-
- Protein alignment and Bayesian methods (Green and Mardia,2006; Green, Mardia Vysaul and Ruffieux 2010, Mardia et al, 2011)
- Statistical distribution of conformational angles and Ramachandran plots (Mardia, Taylor and Subramaniam, 2007)
- Prediction and simulation of protein structure (Boomsma, W., Mardia, K.V., Taylor, C.C., Ferkinghoff-Borg, J., Krogh A. and Hamelryck, T. ,2008 .).
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 17th April 2012
11:00am
Jessica Chen
MSc Student
Department of Statistics, UBC
Co-op Learning: Overall Work Experience and the Longitudinal Analysis performed during Co-op Terms
Show Abstract
Divided into two parts, the work experience I gained from my cooperative education placements and the longitudinal analysis I did during my second placement, are addressed in this presentation. Contents in part 1 include general information of my two cooperative placements, a description of my overall work experience, and an overview of work I did. Part 2 outlines the longitudinal analysis of monthly symptoms survey data from the children who had a life-threatening disease in Canada that I did in my cooperative term. The main purpose of this analysis was to estimate the effect size of changes in season on the total number, frequency, and distress of symptoms. A longitudinal study of 106 children in different cities in Canada was performed. The demographic variables and monthly symptoms survey were collected. R software was employed to generate linear mixed effects models. The finding of this analysis showed that of the children who had life-threatening disease there was a seasonal effect, with less total number, frequency, and distress of symptoms in the month of August.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 3rd April 2012
11:00am
PhD Candidate
Department of Statistics, UBC
Tail dependence and its influence on risk measures
Show Abstract
Tail dependence and tail asymmetry are often observed in multivariate insurance and financial data, and modelling the tails correctly is important for quantifying risks of simultaneous large losses. Dependence modelling with copulas is now a common technique for inference on multivariate tails. When using copula to account for those tail patterns, a fundamental task in risk management is to understand the tail behavior of copula and its influence on risk measures such as Conditional Tail Expectation (CTE) and Value at Risk (VaR). We use tail order and tail order parameters to quantify the strength of tail dependence. Then concepts of intermediate tail dependence and tail comonotonicity will be discussed; the former can incorporate a wider range of dependence in the tails and the latter may lead to a data-driven conservative dependence structure. For these two tail dependence structures, relevant theoretical results, simulations and applications with insurance data will be presented.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 20th March 2012
11:00am
School of Finance and Statistics
East China Normal University
Testing Homogeneity in a Semi-parametric two-Sample Problem
Show Abstract
We study a two-sample homogeneity testing problem, in which one sample comes from a population with density $f(x)$ and the other is from a mixture population with mixture density $(1-\lambda)f(x) + \lambda g(x) $. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semi-parametric assumption $g(x)=f(x)e^{\alpha+\beta x}$, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM-test based on the penalized empirical likelihood. We prove that the EM-test has a simple chi-square limiting distribution and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 6th March 2012
11:00am
Department of Mathematics & Statistics
University of Victoria, BC
A Variational Bayes Spatiotemporal Hidden Markov Model for Electromagnetic Brain Mapping
Show Abstract
In this article we present a new variational Bayes approach for solving the
neuroelectromagnetic inverse problem arising in studies involving
electroencephalography (EEG) and magnetoencephalography (MEG). This spatial
problem involves the estimation of time-varying neural activity at a large
number of locations within the brain, from time series recorded at a
relatively small number of locations on or near the scalp. The
underdetermined nature of this estimation problem necessitates the use of
regularization methods, either through penalization, or through the
inclusion of priors in a hierarchical Bayes setting.
Framing this problem within the context of variable selection in a dynamic
linear model, we propose a mixture formulation, where the spatial profile of
activity within the brain is represented with a latent process governed by
an autologistic model. The autologistic model accommodates spatial
clustering in brain activation, while also allowing for the inclusion of
auxiliary information derived from alternative imaging modalities, such as
fMRI or PET. We develop a variational approach for approximate Bayesian
inference, and we compare this approach with several established methods,
including low-resolution electrical tomography (LORETA) and the well-known
minimum norm estimate. Joint work with Arif Babul, Alexander Moiseev, Mirza
Faisal Beg, and Naznin Virji-Babul.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th February 2012
11:00am
James Proudfoot
MSc Candidate
Department of Statistics, UBC
Climate Downscaling Methods
Show Abstract
Abstract:
Global climate models (GCM) offer synoptic scale weather data under different climate scenarios, but often times the grid on which data is available is too sparse to be of real use. The goal of this talk is to introduce the field of climate downscaling, and present a few downscaling techniques (both spatial and temporal), focusing on my work at Environment Canada with exponential dispersion models and PCA. Specifically, I'll be discussing some of the aspects of the Tweedie family of distributions which make them a straightforward choice for temporal downscaling with semi-continuous data, some techniques for scoring different stochastically simulated weather series, and a method for producing air temperature data at fine resolutions on complex terrains.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Tue 21st February 2012
11:00am
(student invited van Eeden speaker)
Department of Statistics
University of California - Berkeley
Statistics and Computation in the Age of Massive Data
Show Abstract
There are many issues remaining to be addressed, or even formulated,
at the interface of statistics and computation. One way to capture
the current state of affairs is the following: If we view data as a
resource, how can it be that in many practical problems of interest
we find ourselves embarassed by being given too much data? The issue
is both statistical and computational---on a fixed computational budget
we are unable to guarantee that the statistical risk decreases as the
number of data points grows (without bound). A general theory not
yet being available, I present two initial forays into the problem
domain. The first is an exploration of the bootstrap in the regime
of very large data sets, where it is computationally infeasible to
obtain bootstrap resamples. I present a new procedure, the ``bag of
little bootstraps,'' which inherits the favorable theoretical properties
of the bootstrap but is also scalable. The second is an exploration of
divide-and-conquer strategies for matrix completion. Here the theoretical
support is provided by concentration theorems for random matrices, and I
present a new approach to this problem based on Stein's method.
[Joint work with Ariel Kleiner, Lester Mackey, Purna Sarkar, Ameet
Talwalkar, Richard Chen, Brendan Farrell and Joel Tropp].
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 14th February 2012
11:00am
Department of Statistics
Pennsylvania State University
Inference with implicit likelihoods for climate science and infectious disease
Show Abstract
Complex deterministic and stochastic models are often used to
describe dynamic systems in climate science, ecology and biology. Inferring
unknown parameters of these models is of interest, both for understanding the
underlying scientific processes as well as for making valid predictions. Some
of the challenges typically involved in inference for these models are:
likelihood functions that are intractable or only implicitly specified by a
computer model; computationally expensive model simulations; and high-
dimensional, multivariate observations and model output.
I will outline computationally expedient Gaussian process-based inferential
approaches in the context of two very different models, a deterministic Earth-
system model used in climate science, and a stochastic spatial model for
infectious diseases. I will point out some of the common features between the
two, but also highlight significant differences in the modeling frameworks
and inferential goals.
This talk is based on joint work with K. Sham Bhat (Los Alamos National
Labs), Roman Jandarov (Dept. of Statistics, Penn State University [PSU]),
Roman Tonkonojenkov (Dept. of Geosciences, PSU), Klaus Keller (Dept. of
Geosciences, PSU), Ottar Bjornstad (Center for Infectious Disease
Dynamics, PSU), and Bryan Grenfell (Ecology and Evolutionary Biology, Princeton University)
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 2nd February 2012
4:00pm
Department of Statistics
UBC
Nonstationary Modeling via Dimension Expansion
Show Abstract
If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. For example, the air quality characteristics of cities are likely to be more similar than that of rural areas irrespective of their geographic proximity. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of models that are more complex than those which rely on the assumption of stationarity.
In this talk, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variables models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Tue 31st January 2012
11:00am
University of British Columbia, Research Associate.
Building Bridges: from Proteins and Genes to Instrumental Variables Estimators
Show Abstract
Recent advances in genomic and proteomic technologies have stimulated a large number of biomarker discovery studies in various disease contexts. This talk will be focused on the problem of measurement errors in mass spectrometry proteomic quantitation, which may affect the identification of protein biomarkers in a discovery study. As protein levels are regulated in part by gene expression, related genomic data can be integrated to address this problem through the implementation of instrumental variables estimators. These estimators are designed to provide unbiased and consistent regression parameter estimates using additional information provided by the instrumental variables (e.g., genes) to correct the measurement error. However, classical instrumental variables estimators may be seriously affected by outlying observations that are usually present in mass spectrometry proteomic data. We propose a new robust instrumental variables (RIV) estimator that is highly resistant to outliers and has attractive theoretical properties. We use RIV to identify human plasma proteomic biomarkers of cardiac allograft vasculopathy using related genomics data as instruments. I will close this talk by giving an overview of other problems related to the analysis and integration of genomic and proteomic data.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Thu 26th January 2012
11:00am
Research Associate
Department of Statistics
Department of Biostatistics and Medical Informatics
University of Wisconsin-Madison
Adaptive procedures for false discovery rate estimation and control
Show Abstract
Multiple testing has generated a surging interest in recent years due to the wide availability of large and complex modern data sets. Much research focused on the false discovery rate (FDR) estimation and control, and adaptive procedures have particularly attracted growing attention. By incorporating good estimates of the proportion of true null hypotheses among all hypotheses, adaptive procedures have been shown to increase the power of detecting non-null hypotheses while maintaining the FDR. Most existing adaptive procedures rely on tuning parameters, which can be either assigned a priori (fixed) or estimated from data (dynamically). In this talk, I will first provide a finite sample proof of conservative point estimation for fixed adaptive FDR procedures. Then, I will present a general condition under which dynamic adaptive procedures can lead to conservative null proportion and FDR estimators. In addition, I will derive asymptotic results on FDR estimation and control for a class of dynamic adaptive procedures under some weak dependence condition. I will conclude by discussing applications of the FDR to high-throughput genomics data.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Tue 24th January 2012
10:45am
Postdoctoral Fellow, Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Epigenetic changes in cancer revealed by whole-genome shotgun bisulfite sequencing data
Show Abstract
DNA methylation is a widely studied epigenetic mark known to be
implicated in tissue differentiation and disease, specifically cancer.
We have performed the first genomewide analysis of changes in DNA
methylation in cancer, using whole-genome shotgun bisulfite sequencing
of 3 paired tumor-normal samples. We describe our statistical
analysis of this new type of data, which involves smoothing a binomial
process. Our analysis led to a number of new insights into cancer
epigenetics, including a description of the structure of local,
small-scale changes in cancer. We also show hypo-methylation of large
scale genomic domains encompassing more than half the genome.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Thu 19th January 2012
11:00pm
Postdoctoral Fellow, Department of Biostatistics
Bloomberg School of Public Health
Johns Hopkins University
Incorporating Genotype Uncertainties into the Genotypic TDT
Show Abstract
Genotype imputation has become a standard option for researchers to expand their genotype datasets to improve signal precision and power in tests of genetic association with disease. In imputations for family based studies however, subjects are often treated as unrelated individuals: currently, only BEAGLE allows for simultaneous imputation for trios of parents and offspring, but only the most likely genotype calls are returned, not estimated genotype probabilities. For population based SNP association studies, it has been shown that incorporating genotype uncertainty can be more powerful than using hard genotype calls. We here investigate this issue in the context of case-parent family data. We present the statistical framework for the genotypic transmission-disequilibrium test (gTDT) using observed genotype calls and imputed genotype probabilities, derive an extension to assess gene-environment interactions for binary environmental variables, and illustrate the performance of our method on a set of trios from the International Cleft Consortium.
Statistics
Michael Smith Labs, Room 102, 2185 East Mall, UBC (please note location)
Tue 17th January 2012
10:45am
Bryan Howie
Postdoctoral Scholar, Stephens Lab
Department of Human Genetics, University of Chicago
Statistical methods for genotype imputation: Past, present, and future
Show Abstract
Over the past five years, one of the central pursuits in human genetics has been the use of genome-wide association studies to discover genetic variants that affect disease risk. These studies have identified thousands of well-supported associations between genetic markers and human traits, and their success can be partially attributed to a few key advances in statistical methodology. One of these advances, known as "genotype imputation," uses the correlation structure of genetic variation in a reference database (such as those provided by the HapMap and 1,000 Genomes Projects) to predict genotypes that were not measured in a particular study. When these unobserved genotypes are estimated accurately, they can accelerate discovery by revealing new associated regions and providing greater detail in regions of known association.
In this talk, I will discuss the development of the genotype imputation field from its initial stages to its modern-day applications, with a focus on the major conceptual and statistical challenges that were confronted along the way. I will also discuss the prospects for improving imputation methodology, the role these methods will play as the human genetics field moves toward DNA sequencing technologies, and some of my new work in collaborative biological research.