Seminars

Statistics
Leonard S. Klinck 460, 6356 Agricultural Road, UBC
Mon 6th December 2010
3:00pm
Department of Statistics Stanford University
Visualization and Modeling of the Joint Behavior of Two Long Tailed Random Variables
Show Abstract
 

Many of the variables relevant to online advertising have heavy tails.  Keywords range from very frequent to obscure.  Advertisers span a great size range.

Host web sites range from very popular to rarely visited.

 

Much is known about the statistical properties of heavy tailed random variables.  The Zipf distribution and Zipf-Mandelbrot distribution are frequently good approximations.

 

Much less attention has been paid to the joint distribution of two or more such quantities.  In this work, we present a graphical display that shows the joint behavior of two long tailed random variables.  For ratings data (Netflix movies, Yahoo songs) we often see a strong head to tail affinity where the major players of one type are over-represented with the minor players of the other.  We look at several examples which reveal properties of the mechanism underlying the data.  Then we present some mathematical models based on bipartite preferential attachment mechanisms and a Zipf-Poisson ensemble.

 

This is joint work with Justin Dyer.

 

 
 
  
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 25th November 2010
4:00pm
Assistant professor of Biostatistics. Faculty of Health Sciences, SFU
CANCELLED Meta-analysis of Observational Data
Show Abstract

 Meta-analysis is a statistical method that is used to combine the results of different studies in order draw conclusions about a body of research. 

 For example, one might imagine extracting hazard ratios and odds ratio from a collection of different health research papers looking at the 
 effectiveness and safety of a drug (e.g. antidepressants). An emerging area of innovation in statistics involves meta-analysis of observational studies.
 Unlike randomized controlled trials, which are the gold standard for proving causation, observational studies are prone to biases such as confounding
 and measurement error. In this talk I will give an overview of meta-analysis of observational studies and draw parallels with sensitivity analysis
 techniques and Bayesian analysis.  I will motivate the discussion with the example of a meta-analysis of relationship between oral 
 contraceptive use and endometriosis.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 23rd November 2010
11:00am
Graciela Boente
Departamento de Matematicas and Instituto de Calculo - Universidad de Buenos Aires and CONICET.
Robust inference in generalized linear models with missing responses
Show Abstract

The generalized linear model (glm) (McCullagh and Nelder, 1989) is a popular technique for modelling a wide variety of data and assumes that the observations are independent and that the conditional distribution of the response given the covariates belongs to a canonical exponential family. Robust procedures for generalized linear models have been considered among others by Stefanski et al. (1986), Kunsch et al. (1989), Bianco and Yohai (1996), Cantoni and Ronchetti (2001), Croux and Haesbroeck (2002) and  Bianco et al. (2005). Recently, robust tests for the regression parameter under a logistic model were considered by Bianco and Martinez (2009).

 

In practice, some response variables may be missing, by design (as in two-stage studies) or by happenstance. The  methods described above  are designed for complete data sets and problems arise when missing observations are present. In this talk, we focus our attention on those cases in which missing data occur only in the responses. This situation is frequent in opinion polls, socio-economic investigations, medical studies and other scientific experiments  where the explanatory variables can be controlled. In these studies  outliers can also be present and so, robust procedures need to be considered.

 

We consider robust estimators for the regression parameter of a generalized linear model in order to build test statistics for this parameter when missing data occur in the responses. When there are no missing data, these estimators include the family of estimators previously studied by several authors such as Bianco and Yohai (1996), Cantoni and Ronchetti (2001), Croux and Haesbroeck (2002) and  Bianco et al. (2005). The robust estimates are asymptotically normally distributed which allows to construct robust testing procedures. The asymptotic distribution of the test statistic under  contiguous alternatives is also obtained. The sensitivity of the procedures to single outliers will be studied through their influence function, while the finite sample properties of the proposed procedure are investigated through a Monte Carlo study where the robust test is  also compared with  nonrobust alternatives.

 

References:

 

Bianco, A., Garcia Ben, M. and Yohai, V. (2005).Robust estimation for linear regression with asymmetric errors. Canad. J. Statist., 33, 511-528.

 

Bianco, A. and Martinez, E. (2009). Robust testing in the logistic regression model. Comp. Statist. Data Anal., 53, 4095-4105.

 

Bianco, A. and Yohai, V. (1996). Robust estimation in the logistic regression model. Lecture Notes in Statistics, 109, 17-34. Springer-Verlag, New York.

 

Cantoni, E. and Ronchetti, E., 2001. Robust inference  for generalized linear models. Journal of the American Statistical Association, 96, 1022--1030.

 

Croux, C. and Haesbroeck, G. (2003). Implementing the Bianco and Yohai estimator for logistic regression. Comp. Statist.  Data Anal., 44, 273-295.

 

Kunsch, H., Stefanski, L. and Carroll, R. (1989). Conditionally unbiased bounded influence estimation in general regression models with applications to generalized linear models. J. Amer. Assoc., 84, 460-466.

 

Mc Cullagh, P. and  Nelder, J.A. (1989). Generalized Linear Models, London: Chapman and Hall.

 

Stefanski, L., Carroll, R. and Ruppert, D. (1986). Bounded score functions for generalized linear models. Biometrika, 73, 413-424.

 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 18th November 2010
4:00pm
Professor of Children's Environmental Health SFU Senior Scientist at the Child & Family Research Institute, BC Children’s Hospital
Low-level lead Toxicity: Much Ado About Nothing?
Show Abstract

In a series of studies, we found that levels of lead below 10 micrograms per deciliter of whole blood – levels that are currently considered by the World 

Health Organization and Health Canada to be protective for children -- were associated with diminished intellectual abilities in children.  Indeed, the 

lowest levels of exposure were associated with greater decrements in IQ scores.  This presentation will review the evidence for a non-linear, dose-

response relationship of lead exposure with intellectual decrements and discuss the implications for policy. 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th November 2010
11:00am
Laboratoire JA Dieudonné Université de Nice - Sophia Antipolis University of Nice, France. PIMS Visitor, UBC.
Particle systems, definitions and proof of convergence (uniformly in time)
Show Abstract

I will talk about particle systems used to approximate  conditional laws. I will present the classic system and something called the interacting Kalman filter.  I will give some elements of proof of why this last system has a good performance uniformly in time. 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 4th November 2010
4:00pm
Yinshan Zhao
Research Associate MS/MRI Research Group, Department of Medicine, UBC
A statistical approach for detecting unusually large increases in MRI activity in multiple sclerosis
Show Abstract
 
We will give a brief introduction to an ongoing research project entitled "Improving Safety Monitoring and Designs of Future Multiple Sclerosis (MS) Clinical Trials" with emphasis on using repeated magnetic resonance imaging (MRI) measures as a safety monitoring tool in MS clinical trials. A brief summary of this research is given below.
 
Data and Safety Monitoring Boards (DSMBs) who oversee ongoing clinical trials review patients' information on a regular basis for potential safety issues. In MS clinical trials, an unexpected increase of contrast enhancing lesions (CELs) on individual-patient-level has been used by DSMBs as an early warning of disease worsening. However, there are no published studies that clearly identify what should be considered as an "unexpected increase" of CEL activity.  The existing guidelines often rely on a pre-defined threshold, such as five or more CELs above baseline level. This simple approach fails to account for variation across individual patients and clinical trial cohorts and does not utilize all the CEL information available. We consider a probability-based index for detecting the increase, that is, the likelihood of observing a lesion count as large as that observed on the recent scans of a patient conditionally on the patient's lesion counts in the past. To estimate this conditional probability, we utilize models with patient specific random effects. Given the patient's random effect, we assume that the lesion counts from the same patient follow a negative binomial distribution and are either mutually independent or have an AR(1) dependence structure. Under this framework, the conditional probability can be evaluated based on the activity level of the overall cohort, the distribution of the random effects and the parameters that describe the within-patient over-dispersion and AR(1) dependence. As large variability exists among MS cohorts, we propose to estimate these model components using the data from the study cohort and update the estimates whenever new data becomes available. We considered two different estimating procedures to estimate the random effects distribution and evaluated the performance of our methods using both simulations and data from two clinical trials.
 

  

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 26th October 2010
11:00am
Department of Statistics, UBC.
Developing Expert-like Behaviour in Undergraduate Statistics Students
Show Abstract

 "Celebrate Learning Week"  

The goal of undergraduate instruction is to engender expert-like traits in the learners.  Expert-like behaviour is described in general, along with how someone becomes expert in a given field. Suggestions are proposed as to which skills an undergraduate program in Statistics should promote and how instruction might best transform students into expert-like thinkers in the discipline. 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 14th October 2010
4:00pm
MSc student Department of Statistics, UBC
Regression Approaches to Estimation of Relative Risk: Application to MS Studies
Show Abstract

Using a log link for binary response in generalized linear

mixed-effects models (GLMM) allows direct estimation of the relative

risk. If a random intercept is the only random effect in the

conditional mean structure, the marginal mean has the same

form. The fixed effects, representing the log relative risks, have the

same interpretation in both the mixed-effects model and the marginal

model. This leads to two approaches to estimate the relative risks, 1)

maximum likelihood for the mixed-effects models and 2) the generalized

estimating equations (GEE) approach for the marginal models.

 

In our study, we apply such log-linear models to assess the effects of

neutralizing antibodies on interferon beta-1b in relapsing-remitting

multiple sclerosis. The results obtained by the two approaches are

compared. The relative efficiency of the GEE approach and the

robustness of the GLMM approach to some forms of misspecification of

the model for the random effects are studied by simulations.   

 
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 12th October 2010
11:00am
Head of the Institute of Statistics of the University of Klagenfurt, Austria. Visiting Professor, Department of Statistics, UBC.
Model-based spatial prediction and design
Show Abstract

The major disadvantage with conventional spatial (Kriging) interpolation methodology is the fact that the claimed property of best linear unbiased prediction (BLUP) no longer holds when estimates of the spatial covariance parameters are plugged in. In my talk I report on recent work with my colleagues Gunter Spoeck and Hannes Kazianka in the area of Bayesian spatial prediction and design. The Bayesian approach not only offers more flexibility in modeling but also allows us to deal with uncertain covariance parameters, and it leads to more realistic estimates for the predicted variances.
 

We report on some experiences gained with our approach during a European project on "Automatic mapping of radioactivity in case of emergency". Moreover, I report on recent results on finding objective priors for the crucial nugget and range parameters of the widely used Matern-family of covariance functions. Finally, I will consider the problem of choosing an "optimal" spatial design, i.e. finding an optimal spatial configuration of the observation sites minimizing the mean squared error of prediction over an area of interest. Using Bessel-sine/cosine- expansions for random fields we arrive at a design problem which is equivalent to finding optimal Bayes designs for linear regreesion models with uncorrelated errors, for which powerful methods and algorithms from convex optimization theory are available.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 5th October 2010
11:00am
Whipple V. N. Jones Professor of Statistics and Chair Department of Statistics, Harvard University
30 Years of Bootstrap and Multiple Imputation: Joint Replications versus Conditional Replications
Show Abstract

The papers by Efron (1979, Annals of Statistics) and Rubin (1978, Proceedings of the ASA) are generally regarded as the birth of the bootstrap method and multiple imputation (MI), respectively. The proximity of their births is perhaps not entirely coincidental, and indeed over the years there has been some confusion over their similarities and differences.  Both rely on the principle of replications to assess statistical variability, are conceptually deep for specialists yet appealing for users, and require intensive computation (though for different parties).  The bootstrap, however, has been used mainly for “one-party inference” and typically requires a large number of joint replications, namely, the entire sample needs to be replicated. In contrast, MI is primarily designed for “two-party inference” in the context of dealing with incomplete data: the imputer being one party and any potential user being the other. Furthermore, MI often only requires a small number of replications (i.e., imputations), because of its reliance on conditional replications (i.e., conditioning on the observed data) and on a key ANOVA-type variance decomposition. The decomposition can be justified from three different perspectives: Bayesian, likelihood, and design-based, when the two parties are congenial to each other. However, when the two parties are uncongenial toward one another, the story becomes considerably more intriguing …
Statistics
MATH 203, 1984 Mathematics Road, UBC
Mon 4th October 2010
2:00pm
Whipple V. N. Jones Professor of Statistics and Chair Department of Statistics, Harvard University
The Making of Sexy Statistics and Statisticians: Some Recent Harvard Experiments
Show Abstract
Hal Varian, Google’s Chief Economist, has been widely quoted as predicting that “the sexy job in the next ten years will be statisticians.”  Whereas predicting the future is typically difficult, this is an easy one. The demand for statistics and statisticians is such that statisticians are now both desired and feared (Meng, 2009, 2010, American Statistician), and we are under tremendous pressure to deliver both quantity and quality (Meng, 2009, 2010, Amstat News).  To meet this demand, we need to rethink, reform, and redouble our pedagogical efforts, from general education to Ph.D. training. This talk provides an overview of some recent experiments conducted at the Harvard Department of Statistics. These include a real-life topic module-based General Education course---Real-Life Statistics: Your Chance for Happiness (or Misery)---for undergraduate students who are afraid of anything quantitative; a required course for all first-year Ph.D. students---The Art and Practice of Teaching Statistics---for developing both teaching ability and communication skills in general; an all-faculty participated course---Problem Solving in Statistics---aimed primarily for second-year Ph.D. students to help them to prepare for qualifying exams as an intensified learning opportunity for research (Blitzstein and Meng, to appear, American Statistician); a workshop course designed for third year and beyond Ph.D. students---Research Cultivation and Culmination---guidingthem through the whole process of developing an idea into a full publication; a Graduate Seminar in General Education---Statistical Fallacies and Paradoxes: A Cartoon Guide---a course interweaving research and pedagogy by involving a group of Ph.D. and masters students in turning research findings and publications on foundational statistical thinking into teaching material for introductory courses at the undergraduate level. 

This talk is intended for everyone, as many of these experiments can be carried out in any department. Even for those who have no interest in statistics or pedagogy, this talk may provide an entertaining (and sexy) ending … 
Statistics
MSL 102, Michael Smith Labs, 2185 East Mall, UBC
Fri 1st October 2010
4:00pm
Whipple V. N. Jones Professor of Statistics and Chair Department of Statistics, Harvard University The 2010/11 Constance van Eeden Distinguished Lecturer (This talk will be followed by a reception at 5:00 pm in MSL 101).
Trivial Mathematics but Deep Statistics: Simpson’s Paradox and Its Impact on Your Life
Show Abstract
   
Few paradoxes have impacted everyday life more than Simpson’s Paradox has. Yet paradoxically, Simpson’s paradox is not even a paradox in the mathematical sense. Simple arithmetic can easily show that it is possible for a surgeon to have the highest overall success rate, and yet have the lowest success rates for each type of surgeries he performed. The fact that you may feel this phenomenon counterintuitive is precisely the reason that the Simpson’s paradox has led to many erroneous conclusions and decisions that affect people’s life, particularly those from social and medical studies, where comparisons using aggregated data are routinely performed. This talk demonstrates the danger of Simpson’s paradox via a number of real-life examples, from the famous Berkeley sex bias case to measuring disparity in mental health service based on the recently released National Latino and Asian American Study (NLAAS),  and from batting averages and to a recent debate on unemployment rates (Wall Street Journal, December 2, 2009). No statistical background is required to understand this talk, but only some common sense and a desire to think deeply beyond formulas.
 
(This is also G-rated talk because it is a “gadgeted” seminar. Never heard of it? Well, this is your chance …)  
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 28th September 2010
11:00am
Jin Zi
Postdoctoral Fellow, Department of Statistics UBC
Aspects of Composite Likelihood Inference
Show Abstract

A composite likelihood consists of a combination of valid likelihood objects. It is shown to be an good and practical alternative to the ordinary full likelihood when the full likelihood function is intractable, or difficult to evaluate due to complex dependencies. The resulting estimator enjoys desirable asymptotic properties such as consistency and asymptotic normality. In this talk we aim to compare performance of composite likelihood estimation relative to estimation based on full likelihood. Analytical and simulation results will be presented for different models.  We will show that the composite likelihood approach is highly efficient, and for a few but important cases the composite likelihood is fully efficient with identical estimators compared to the full likelihood.

Statistics
WMAX216 (PIMS), 1933 West Mall, UBC
Mon 20th September 2010
2:00pm
Research Scientist, Environmental Health, Agriculture and Agri-Food Canada (AAFC), Lethbridge
Statistical modeling of complex agroecosystems in a changing climate
Show Abstract

Natural resource problems typically must be modeled using data that is often incomplete, asynchronous and collected at different spatial and temporal scales with differing levels of measurement uncertainty. Both deterministic and stochastic models are widely applied in assessing environmental impacts, identifying risks and informing resource management decision-making for agricultural systems. However, existing models are high-dimensional, requiring extensive site-specific calibration, thereby limiting their spatial application. Likewise, simpler models, inevitably, must be combined to aid in more robust, integrative regional management or national policy-relevant decision-support. Using variable- and model-selection statistical techniques, one can identify models of intermediate complexity that can achieve appreciable reductions in parameter and structural uncertainty. In this way, such models may offer more reliable support to address a range of applications/problems and to identify critical thresholds and allocation trade-offs.

 

My talk will discuss several collaborative, inter-disciplinary projects that are investigating ways to improve the prediction and forecasting of crop production for food and energy in relation to water-use efficiency and climate variability across Canada.

 

I will highlight the use of wireless sensor monitoring network and satellite remote-sensing data. The talk will also showcase several national-scale, web-based decision-support systems currently in development. Here, the ability to refine and adapt models to take into account spatial and temporal-type operational constraints is of vital importance.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 25th August 2010
2:00pm
Yan (Lucy) Cheng
MSc Student Department of Statistics, UBC
Wood Property Relationships and Survival Models in Reliability
Show Abstract
It has been a topic of great interest in wood engineering to understand the relationships between the different strength properties of lumber and the relationships between the strength properties and covariates such as visual grading characteristics. In our mechanical wood strength tests, each piece fails (breaks) after surviving a continuously increasing load to a level. The response of the test is the wood strength property -- load to failure, a special expression of time to event survival data, which introduces survival analysis. This topic is also called reliability analysis in engineering.
 
We apply the methodologies in survival analysis to the wood strength data collected in the FPInnovations (FPI) laboratory. In addition to make the current lumber grading system more powerful and reliable, this predictive model would also be a method for matching pieces of lumber using significant X's, solving the problem of two tests cannot be conducted at the same lumber. In this report, we present the basic concepts, parametric methods, nonparametric methods, a semi-parametric model and a parametric model for analyzing survival data.
 
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 24th August 2010
11:00am
Eric Cormier
MSc student Department of Statistics, UBC
Time-Varying Exposure Subject to Misclassification Bias Characterization and Adjustment
Show Abstract

Measurement error occurs frequently in observational studies investigating the relationship between exposure variables and a clinical outcome. Error-prone observations on the explanatory variable may lead to biased estimation and loss of power in detecting the impact of an exposure variable. When the exposure variable is time-varying, the impact of misclassification is complicated and significant. This increases uncertainty in assessing the consequences of ignoring measurement error associated with observed data, and brings difficulties to adjustment for misclassification.

In this study we considered situations in which the exposure is time-varying and nondifferential misclassification occurs independently over time. We determined how misclassification biases the exposure outcome relationship through probabilistic arguments and then characterized the effect of misclassification as the model parameters vary. We show that misclassification of time-varying exposure measurements has a complicated effect when estimating the exposure-disease relationship. In particular the bias toward the null seen in the static case is not observed.

After misclassification had been characterized we developed a means to adjust for misclassification by recreating, with greatest likelihood, the exposure path of each subject. Our adjustment uses hidden Markov chain theory to quickly and efficiently reduce the number of misclassified states and reduce the effect of misclassification on estimating the disease-exposure relationship.

The method we propose makes use of only the observed misclassified exposure data and no validation data needs to be obtained. This is achieved by estimated switching probabilities and misclassification probabilities from the observed data. When these estimates are obtained then the effect of misclassification can be determined through the characterization of the effect of misclassification presented previously. We can also directly adjust for misclassification by recreating the most likely exposure path using the Viterbi algorithm.

The methods developed in this dissertation allow the effect of misclassification, on estimating the exposure-disease relationship, to be determined. It accounts for misclassification by reducing the number of misclassified states and allows the exposure-disease relationship to be estimated significantly more accurately. It does this without the use of validation data and is easy to implement in existing statistical software.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Mon 23rd August 2010
3:00pm
MSc Student Department of Statistics, UBC
Joint Inference for Longitudinal and Survival Data with Incomplete Time-dependent Covariates
Show Abstract

In many longitudinal studies, individual characteristics associated with their repeated measures may be covariates for the time to an event of interest. Thus, it is desirable to model both the survival process and the longitudinal process together. Statistical analysis may be complicated with missing data or measurement errors in the time-dependent covariates. This thesis considers a nonlinear mixed-effects model for the longitudinal process and the Cox proportional hazards model for the survival process. We provide a method based on the joint likelihood for nonignorable missing data, and we extend the method to the case of  time-dependent covariates. We adapt a Monte Carlo EM algorithm to estimate the model parameters. We compare the method with the existing two-step method with some interesting findings. A real example from a recent HIV study is used as an illustration.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Mon 23rd August 2010
3:00pm
Yihui (Eva) Luo
MSc Student Department of Statistics, UBC
Tail dependence of financial returns
Show Abstract

Nikoloulopoulos et al. [2010] compared several bivariate copulas, such as t, BB1 and other copulas, to the GARCH (1,1) - filtered financial stock returns. They showed that the BB1 copula-GARCH model performed relatively better than others in terms of likelihood fit and extreme quantiles prediction. This project is conducted to test the assumption that the estimations provided by such parametric copula-GARCH model following the Maximum Likelihood method are not reliable. Therefore the model does not necessarily give a trustable measurement of tail dependence.

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 19th August 2010
3:00pm
Libo Lu
Master's student Department of Statistics, UBC
An Approximate Inference Method for Analyzing Joint Models in Longitudinal Studies
Show Abstract

Longitudinal studies often contain several statistical issues, including longitudinal process and time-to-event process, the association among which requires joint modeling for unbiased estimation. The computation of the joint modeling, such as EM algorithm, might be extremely intensive and lead to convergence problems.

 

In this talk, we introduce an approximate likelihood-based inference method for jointly modeling longitudinal process and time-to-event process based on a NLME model and a parametric AFT model. By linearizing the joint model, we design a strategy for updating the random effects that connect two processes, and propose two frameworks for different scenarios of likelihood function. Both frameworks approximate the multidimensional integral in the observed-data joint likelihood by analytic expression, which greatly reduce the computational intensity of the complex joint modeling problem. The new method looks promising in terms of both estimation results and computation efficiency, especially when more subjects are given.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 19th August 2010
3:00pm
Master's student Department of Statistics, UBC
Properties of Empirical and Adjusted Empirical Likelihood
Show Abstract

Likelihood based statistical inferences have been advocated by generations of statisticians. As an alternative to the traditional parametric likelihood, empirical likelihood is appealing for its nonparametric setting and desirable asymptotic properties. In this thesis, we first review and investigate the asymptotic and finite-sample properties of the empirical likelihood, particularly its implication to the construction of the confidence regions for population mean. We then focus on the properties of the adjusted empirical likelihood. The adjusted empirical likelihood was introduced to overcome the shortcomings of the empirical likelihood when it is applied to statistical models specified through general estimating equations. We discover several finite-sample properties of the adjusted empirical likelihood mainly in its application to constructing confidence regions for population mean. One important discovery is that the original adjusted empirical likelihood gives a bounded likelihood ratio statistic. It may cause some problems when the sample size is not large enough or the nominal confidence level is too high. We propose a possible approach to modify the adjusted empirical likelihood so as to get an unbounded likelihood ratio statistic. 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 12th August 2010
11:00am
Associate Professor of Statistics Faculty of Engineering Department of Management and Engineering University of Padova, Italy
Multi-aspect permutation tests with applications in biomedical studies
Show Abstract

In several experimental and observational studies it may happen that the number of observed variables is very much larger than that of subjects. It can be proved that, for a given and fixed number of subjects, when the number of variables diverges and the noncentrality parameter of the underlying population distribution increases with respect to each added variable, then power of combination-based permutation tests based (Pesarin F., Salmaso L.: Permutation tests for complex data: theory, applications and software, Wiley) is monotonically increasing. When testing e. g. for the equality of two distributions in a two-sample problem with treatment effects presumed to act possibly on more than one aspect, different tests may be properly considered for testing for different features of a null hypothesis, leading to the multiple aspect testing issue. Two different aspects maybe therefore of interest: the location-aspect, based on the comparison of location indexes, and the distributional-aspect, based on the comparison of the empirical distribution functions. Combination-based tests allows the experimenter for efficient multi-aspect testing also in presence of mixed variables and missing data. Some application examples from biomedical observational studies along with a demonstration of standalone software NPC Test will be discussed. In particular such applications will cover, among others, repeated measures designs in ophthalmology and shape analysis. Main focus will be on repeated measures designs and longitudinal surveys with mixed variables and/or missing data.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Mon 9th August 2010
11:00am
Department of Mathematics University of Bristol
Credible Intervals of the Local Spectrum Estimate
Show Abstract
   
Time series data occur in many disciplines such as finance and medicine. Often there is a dependence structure between time series observations. The typical indicator of this dependence is the covariance function. If a time series is second order stationary then the mean and variance are constant, and the covariance only depends on the time difference between observations. However many time series are not stationary. One class of non-stationary time series are locally stationary time series that possess slowly evolving second order quantities, such as variance. In these cases models that assume stationarity are inappropriate and alternative methods should be used.
 
An interesting class are locally stationary wavelet models, which can be used to define a localized autocovariance, calculated from an evolutionary wavelet spectrum. This is similar to the spectrum used to analyse stationary time series in the frequency domain, but it is expressed within the wavelet domain and changes through time. The evolutionary wavelet spectrum is estimated from data through the wavelet periodogram. This quantity is asymptotically unbiased but not consistent.
 
We have developed an empirical Bayesian wavelet shrinkage method to smooth the wavelet periodogram thus improve our estimation of the evolutionary wavelet spectrum. Our method has the advantage of producing prediction intervals and probabilities associated with the evolutionary wavelet estimate. The new methodology will be compared with current techniques.
 
Key words: time series, locally stationary, Bayesian wavelet shrinkage, localized autocovariance, local spectrum prediction intervals.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 30th July 2010
11:00am
Zhejiang University, Hangzhou, China
m-dependent approximation and its applications
Show Abstract

We introduce the m-dependent approximation, an effective approximation method, for a  more general class of stationary processes. As its applications, under quite easy verifiable and more weaker conditions, we present some limit theorems for strong invariance principle, the maximum of the periodogram and  spectral density estimation.


   

Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 27th July 2010
11:00am
Institute of Statistical Science Academia Sinica TAIWAN
Patching the Puzzle of Genetic Network
Show Abstract
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 20th July 2010
11:00am
Technische Universitaet Muenchen, Germany
Multivariate financial time series models using pair-copula constructions
Show Abstract

The pair-copula construction method can be used to build flexible multivariate distributions. This class includes drawable (D), canonical (C) and regular vines developed (see for example Kurowicka and Cooke (2006)). The multivariate distribution is build by using only bivariate copulas, which can be identified as as specific conditional and unconditional bivariate margins. This flexible class is very useful for applications in finance and allows for non-Gaussian dependency structures (see Aas et. al (2009), Czado (2009) and Min and Czado (2010)). I will discuss estimation and model selection methods and give applications to multivariate financial time series to illustrate the potential of these model classes.

 

 

References:

1. Aas, K , Czado, C. , Frigessi A. and Bakken, H. (2009) Pair-copula constructions of multiple dependence, Insurance, Mathematics and Economics, 44, 182-198.

2. Czado, C. (2009) Pair-copula constructions of multivariate copulas, preprint.

3. Kurowicka, D.  and  Cooke, R.M. (2006) Uncertainty analysis with high dimensional dependence modelling, Wiley& Sons, Chichester.

4. Min, A. and Czado, C. (2010) SCOMDY models based on pair-copula constructions with application to exchange rates, preprint.


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 15th July 2010
11:00am
Assistant Professor Department of Statistics, Texas A and M
Reduced rank models for spatially correlated functional data
Show Abstract
We present a new method to analyze data from  an experiment using rodent models to investigate the role of p27, an important cell cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arises from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling. Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. We developed a reduced rank functional mixed effects model and use splines to model functions. Our methodology uses two sets of functional principal components for dimension reduction to effectively overcome the difficulty in modeling the covariance kernel of a random function and the difficulty in modeling the correlation between functions. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling.

This is based on the joint work with Jianhua Huang, Josue Martinez, Arnab Maity, Veerabhadran Baladandayuthapani and Raymond Carroll.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 13th July 2010
11:00am
Arthur Pewsey
Universidad de Extremadura, Cáceres, Spain
Sinh-arcsinh Distributions
Show Abstract

  My talk will begin with a description of the `sinh-arcsinh'

transformation, which will then be used to define the sinh-arcsinh family

of distributions. When the base generating distribution is standard

normal, the `sinh-arcsinhed normal' (SASN) class of distributions is

obtained. This class contains symmetric as well as asymmetric members

and allows for tailweights that are heavier or lighter than those of the

normal distribution.  As will be shown, the SASN class is highly tractable

and has many appealing properties. Likelihood based inference for it will

also be considered and applied in the analysis of real data. Finally,

the options used within the sinh-arcsinh formulation, as well as its

extension, will be discussed.


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 17th June 2010
11:00am
Emeritus Professor of Civil Engineering, UBC
Duration of Load Effect in Lumber (DOL) - Experimental test results and adjustment of a damage accumulation model
Show Abstract
The strength of wood depends on the load history to which it has been subjected. This phenomenon is similar to that of fatigue in metals, except that in wood it it is present even under statically applied loads. Thus, a wood beam may not fail immediately upon load application, but if this is sustained over time, the beam may eventually fail. The physics or mechanism for this strength degradation phenomenon is complicated and really not well understood.

Prediction of  the time-to-failure  T  has depended on long-term testing under constant loads, obtaining the probability  distribution of T. In order to extrapolate to other load histories, damage accumulation laws have been proposed. These laws incorporate random parameters that can be calibrated to represent the observed variability in T.

This seminar will describe a large testing program completed in Canada in the 1970’s, still quite unique in the world, using 2 x 6 Hemlock lumber. The talk will then concentrate on the damage model (known as the “Canadian model”) developed to represent the data and to extrapolate to other load regimes (snow, occupancy, etc.).  Other proposed models will also be discussed.  Finally, the talk will describe how the model was put to use in Canada to perform reliability analyses and  to derive design guidelines for wood structures.
 
 
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 10th June 2010
4:00pm
Jun Chen, Eric Cormier, Eric Fu, Yitian Liang, Corinne Riddell, Kevin Ushey, Xiaoyin Zhong
Department of Statistics, UBC
SSC Case Study
Show Abstract

Three teams of graduate students from UBC participated in a case study poster competition at the SSC 2010 conference. Following a brief introduction of the case study, each team will discuss how they approached the problem, and their experience / hardships with the data analysis and poster creation. A brief description of the case study is included here:

 

Angiotensin I-converting enzyme (ACE) inhibitors are an important class of drugs in use for nearly 30 years in the treatment of cardiovascular diseases, such as hypertension and congestive heart failure, for example.  Despite being effective pharmaceutical agents, these drugs have side effects. These serious side effects have been attributed to bradykinin (a pro-inflammatory peptide and potent vasodilator) which has a short half-life that is rapidly inactivated in plasma by two exopeptidases, ACE and aminopeptidase P (APP). Bradykinin is also transformed by carboypeptidase N (CPN) and the active metabolite des-Arg9-bradykinin (ARG) which in turn is inactivated by ACE and APP. Even though potentially deadly side effects have been attributed to bradykinin, there is no experimental evidence. Consequently, the primary objective of this case study is to characterize the activation metabolism of bradykinin and des-Arg9-bradykinin in plasma

and their role in angiooedema.

 

(A full copy of the case study is available at http://www.ssc.ca/en/education/archived-case-studies/ssc-case-studies-2010-metabolism-of-bradykinin-and-endogenous)

BRG
WMAX110 (PIMS), 1933 West Mall, UBC
Thu 3rd June 2010
4:00pm
Johns Hopkins Bloomberg School of Public Health
Joint Analysis of Multiple Genome-wide Chromatin Immunoprecipitation Experiments
Show Abstract

Chromatin immunoprecipitation (ChIP) followed by genome tiling array hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) is a powerful approach to identify genome-wide protein-DNA interactions. It is widely used to study gene regulatory networks. With rapid growth of ChIP-chip and ChIP-seq data in public repositories, it becomes more and more common that multiple datasets related to the same TF, pathway or biological system are collected. When multiple related ChIP datasets are available, analyzing them jointly not only allows one to study commonality and context-dependency of protein-DNA associations, but also creates opportunities to borrow information across datasets to improve statistical inference. This is particularly useful if the data of primary interest are noisy and information from other datasets is required to distinguish signals from noise. We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The model captures the correlation among datasets, which provides basis for sharing information across experiments. The number of parameters in the model grows linearly rather than exponentially when the number of datasets increases. Real data tests illustrate the advantage of JAMIE over the traditional approach that treats individual datasets separately.
    
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 1st June 2010
11:00am
Department of Statistics Stanford University
Valid Representations of Complex Data
Show Abstract
 

One of the main challenges in todays data rich environment is finding useful methods for data integration. I will show examples of  analyses of data from the biological world where the difficulties arise from the heterogeneity of the data involved. I will show examples of combining trees, graphs or spatial data using distances and kernels.


    
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 29th April 2010
4:00pm
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
CANCELLED
Show Abstract
 
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 27th April 2010
11:00am
Department of Biostatistics Harvard University
CANCELLED
Show Abstract
 
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 22nd April 2010
3:00pm
MSc Candidate Department of Statistics, UBC
Statistical Co-op experiences at BC Center for Excellence in HIV/AIDS (St. Paul’s Hospital)
Show Abstract
I will share my co-op experiences working at BC Center for Excellence in HIV/AIDS. Included in this talk will be things I have learned from this experience as well as some suggestions for future co-op students. An example of the projects that I was heavily involved with will be discussed in detail. This project compared the V3 loop genotypic population sequencing and 454 “deep sequencing” in determining HIV co-receptor as well as predicting virologic outcomes.  
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 20th April 2010
11:00am
TELECOM ParisTech, France
Online EM Algorithm for Latent Data Models
Show Abstract

With the generalization of sources of information that generate sustained high volumes of data, there has recently been a renewed interest in online (or recursive) estimation for various statistical models. In this talk, I consider a version of the Expectation-Maximization (EM) algorithm that can be used for online estimation in latent data models with independent observations. The general principle of the approach is to use a stochastic approximation scheme, in the domain of sufficient statistics, as a proxy for a limiting EM recursion. Depending on time and interest, I may also discuss the merit of this approach when used for batch estimation as well as ongoing work to extend the method to the case of hidden Markov models.

 

(Joint work with Eric Moulines)

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 15th April 2010
4:00pm
Department of Obstetrics and Gynaecology School of Population and Public Health, UBC and the BC Women's Hospital & Health Centre
Trial and error: Interpretation of randomized trials
Show Abstract
 

Evidence from randomized trials is considered mandatory for assessing effects of drugs and other therapies and there is a general consensus regarding the central concepts involved in state of the art randomized trial. Nevertheless, some confusion arises because interpretation of randomized trials requires both methodologic and substantive expertise. This interactive session uses examples from the clinical literature to discuss concepts that are central to the interpretation of randomized trials. Specifically, the session highlights how features of the randomized, such as randomization/stratification, blinding and use of placebo or sham therapy serves to ensure a comparability of populations, effects and information.


    
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 13th April 2010
1:00pm
Song Cai
MSc candidate Department of Statistics, UBC
Stochastic Process Based Regression Modeling of Time-to-event Data: Application to phenological data
Show Abstract

We aim to study the response of timings of phenological events, such as bud-bursting, blooming, and fruiting, to climate variables, especially daily average temperatures, and to predict future phenological events. The timing of a phenological event a special type of time-to-event data, and daily average temperature is a time-varying covariate associated with it. Traditional models in survival analysis that are frequently used for dealing with time-dependent covariates are the Cox model and parametric proportional hazards models. However, these models encounter difficulties in our context. The Cox model is not efficient when there is a obvious trend in covariates and it is not generally suitable for prediction. At the same time the proportional hazards models involve complicated integration without a closed-form solution when complicated time-dependent covariates are present. Also, they usually require quite strong distributional assumptions. We developed a stochastic process based regression model for phenological data. Compared with the Cox model, this model is more efficient by using all the time-dependent covariate information, and is suitable for making predictions. Compared with parametric proportional hazards model, the fitting of this model is computationally less demanding, and this model is less restrictive on assumptions. With some extra mild assumptions, this model can be easily extended to incorporate sequential events as responses. It may also be useful for a broad range of survival data in medical study. The application of our model to the bloom dates data in the Okanagan region of British Columbia shows that our model makes sense!


BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 8th April 2010
4:00pm
Lemuel Shattuck Research Professor of Statistical Science and Member of the Faculty of Arts and Sciences Department of Biostatistics, Harvard School of Public Health
Should the Analysis of Multi-Center Trials be Guided by the Trial Design: Basic ideas and insights
Show Abstract
 

Consider a multi-center randomized clinical trial. Should the analysis be guided by the design of the trial? Most investigators would answer in the affirmative.   Yet in practice the design and many important features of most trials are ignored in the analysis. Most analyses of clinical trials assume that the trial has a random sample of patients from some well defined population. This is the basic assumption of most statistical methods employed to analyze trials in which the inference is targeted at drawing conclusions from a well defined population.  In truth there is no random sample of patients nor is there a well defined population of patients.  The patients in a trial can best be described as a “collection” which is defined as the complement of a random sample. Conclusion --- most randomized clinical trials are analyzed incorrectly as the basic assumption of a random sample is not true. However the basis of the inference can rely on the randomization process.  Analytical techniques can be derived which depend only on the randomization process. However the resulting inference will be a “local” inference in that it will only apply to the patients who have entered the study. Another basic tenet in any analysis is to take into account factors that affect the outcome.  Many multi-center trials have large institutional variation. This is especially true in drug trials where the institution’s patient management and support may influence the observed toxicity. However efficiently accounting for institutional variation may be difficult as many multi- center trials may have large numbers of centers, who typically enter a small number of patients.  In this lecture, methods will be described for making inferences which only rely on the randomization process, but which also account for institutional variation. These methods have been adapted to account for permuted blocks which are typically used to design the randomization allocation in many trials. The methods generally result in greater power when compared to statistical methods which tend to ignore both institutional variation and permuted blocks. The methods have been adapted to group sequential trials.

 

    
Statistics
MATH 204, 1984 Mathematics Road, UBC
Tue 30th March 2010
4:00pm
University Distinguished Professor Department of Statistics The Pennsylvania State University
BIG Statistics
Show Abstract
 

In the past decades, we have witnessed the revolution of information technology.  Its impact to statistical research is enormous. This talk attempts to address recent developments and some potential research issues in Business, Industry and Government (BIG) Statistics, with special focus on computer experiment and information systems.  An overall introduction and review will be given, followed by specific research potentials.  For each subject, the problem will be introduced,

some initial results will be presented, and future research problems will be suggested.  If time permits, I will also discuss some recent advances in Search Engine and RFID study.  

Slides of his talk can be downloaded at the website http://www.personal.psu.edu/users/j/x/jxz203/lin/Lin_pub/

 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 25th March 2010
4:00pm
Associate Professor, Department of Pediatrics AHFMR Health Scholar Director, Biostatistics Consulting Group / Team Lead - Biostatistics, WCHRI
Detection of Clusters of Disease Cases and Disease-Related Events
Show Abstract
 

Traditional approaches to statistical disease cluster detection focus on the identification of geographic areas with high numbers of incident or prevalent cases of disease. Events related to disease may be more appropriate for analysis in some contexts. I compare these approaches when the detection of aggregations of cases or events is conducted by testing individual administrative areas that may be combined with their nearest neighbours.  The population and cases or events per case for each area as well as a nearest neighbour spatial relationship are required. I also investigate the power of the tests when implementing a testing algorithm.  The methodology is illustrated on presentations to emergency departments.

 

    
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 23rd March 2010
11:00am
MSc candidate, Department of Statistics, UBC
Co-op at Agriculture and Agri-Food Canada
Show Abstract
 

First I will present an overview of my experience as a co-op student in the Lethbridge Research Center at Agriculture and Agri-Food Canada. Then, I will focus on a particular project in which I was involved. This project consists of performing a sensitivity analysis on an ecosystem model. The aim of this analysis is to identify which inputs of the Biome-BGC ecosystem model explain the variability of two outputs, soil moisture and plant productivity, for spring wheat in North America. There are several methods that can be used to perform a sensitivity analysis. The analysis presented is done using Sobol’s method, which is implemented by the SIMLAB software.


  
  
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 23rd March 2010
11:00am
MSc candidate Department of Statistics, UBC
Publication bias and other issues facing researchers, statisticians
Show Abstract

Many applied statisticians are involved in research that will provide a body of literature necessary in the development of public policy. However, there are often certain pressures that distort this body of knowledge. One major issue is the lack of publication for studies which do not obtain statistically significant results, despite that, through meta-analysis, they could help contribute to an overall statistically significant conclusion. However, it is still common that research sponsored by entities with a financial interest in achieving favourable results is more likely to actually report favourable results, and similarly, research showing the opposite is often strongly attacked and discredited. In this seminar, I focus on a few examples pertaining to drug development, product regulation, and climate research, and how this can effect the involved statisticians.


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th March 2010
11:00am
Rebekah Mohr
MSc candidate, Department of Statistics, UBC
The Shell Co-op Experience
Show Abstract
The co-op experience, specifically relating to the experience at Shell, will be discussed.  Included in this discussion will be the logic behind why it may or may not be in one's interest to complete a co-op, advice for future co-op students, and what I learned from my co-op experience.  The types of opportunities that I was given within Shell will be presented, and an example of a project I completed during my term will be discussed in depth.  This project enabled the adoption of a global tool by transitioning the department from the currently used local tool for project documentation storage into the global tool.

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 9th March 2010
11:00am
Stephanie Cheng
MSc candidate, Department of Statistics, UBC
Biostatistical co-op at St. Paul's Hospital and Oxford Outcomes
Show Abstract
 

In this talk, I will share my co-op experiences working at St. Paul's Hospital and at Oxford Outcomes and talk about my decision to pursue the co-op option.  This talk will also include a detailed look into three projects I was heavily involved in, during my 8-month work term: 1. a meta-analysis of immunosuppresant therapies post-transplant, 2. an investigation into the hip fracture incidence rates in British Columbia over the past decade and 3. a comprehensive literature review on global hip fracture rates.


    
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 2nd March 2010
11:00am
School of Public Health, Drexel University. Community & Occupational Health, University of Alberta.
Autism spectrum disorders and foetal hypoxia: Statistical challenges in analysis of a population-based cohort
Show Abstract

Igor Burstyn1,2 

1: Department of Environmental and Occupational Health, School of Public Health, Drexel University, Philadelphia, PA, USA

2: Community and Occupational Medicine Program, Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada
 

Background: The autism spectrum disorders (ASD) are a group of rare impairments of neurodevelopment that manifest prior to 3 years of age and are associated with impaired verbal and non-verbal communication and social interaction, and restricted and repetitive patterns of behaviour.  ASD reduces quality of life in affected children and their parents, and leads to extraordinary economic costs for society.  A recent review limited to only high-quality articles, on the basis of circumstantial epidemiological evidence, advanced a hypothesis that foetal hypoxia is implicated in ASD, but this hypothesis was never tested directly.  There is some data to suggest that foetal hypoxia is likely to affect boys, but not girls.

Study design: Provincial delivery records (PDR) identified the cohort of 218,890 singleton live births in the province of Alberta, Canada, between 01/01/98 and 31/12/04.  These were followed-up for ASD via ICD-9 diagnostic codes assigned by physician billing until 31/03/08.  Maternal and obstetric risk factors, as well as measures of foetal hypoxia, were extracted from PDR.

Statistical challenges:  [1] Estimates of prevalence of ASD varied from 3/1000 (two services by any combination of psychiatrist or paediatrician) to 5.2/1000 (one claim by any physician).  Actual time of onset of ASD is unknown, but can precede diagnosis by quite some time. Therefore, outcome misclassification is likely and there is no gold standard, such records of assessments from specialized clinics, although we can guess sensitivity and specificity from a similar Canadian study. [2] Foetal hypoxia (exposure) was measured using 3 different tests and not all 3 tests were performed on all subjects tested.  These tests are measured on continuous scale and are dichotomized on the basis of clinical guidelines. Therefore we have measurement error problem aggravated by dichotomization of miss-measured variable that can produce non-ignorable differential exposure misclassification. [3] For half of the subjects, test of hypoxia was not performed (deemed to be very unlikely to be positive?).  Therefore, there is severe missingness that is likely to fail to meet missing-at-random (MAR) assumption. [4] ASD is a very rare outcome, leading to zero-inflation. 

Some results: We ignored complications [1], [2] and [4], but applied Estimation-Maximization (EM) algorithm to problem [3], modeling probability of exposure among missing values using suspected covariates of foetal hypoxia such as low Apgar score, C-section,  low birth weight, etc. Simple correction for deviation from MAR assumption was attempted in sensitivity analysis.  Compared to complete-case analysis, EM algorithm resulted in gain of precision and borderline “significant” effect in expected direction among boys.  Further adjustment for even small deviation from MAR assumption, dramatically alters inference (if one follows traditions of biomedical literature) about effect of foetal hypoxia on ASD risk among full-term boys, supporting a priori hypothesis.  Apparently an important result, but can it be trusted if we have been naïve about uncertainty associated with ignoring all other challenges posed by the data?

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 4th February 2010
11:00am
Department of Mathematics & RiskLab ETH, Zurich, Switzerland
Asymptotic independence for unimodal densities
Show Abstract

Asymptotic independence of the components of random vectors is a concept used in many applications. The standard criteria for checking asymptotic independence are given in terms of distribution functions (dfs). Dfs are rarely available in an explicit form, especially in the multivariate case. Often we are given the form of the density or, via the shape of the data clouds, one can obtain a good geometric image of the asymptotic shape of the level sets of the density. In the talk, a simple sufficient condition for asymptotic independence in terms of this asymptotic shape for light-tailed densities will be presented. This condition extends Sibuya's classic result on asymptotic independence for Gaussian densities.

 

 

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 28th January 2010
11:00am
Veronica Berrocal
SAMSI North Carolina
Downscaling outputs from numerical models
Show Abstract
In many environmental disciplines, data often arise from two sources: numerical models and monitoring networks. The first source provides predictions at the level of grid cells and is characterized by full spatial coverage of the region of interest, high temporal resolution, no missing data, but consequential calibration concerns. The second gives measurements at points, tends to be sparsely collected in space with coarse temporal resolution, often with missing data but, where recorded, provides, essentially, the true value. Integrating the two sources of data has been a widely investigated topic among several communities: from atmospheric scientists (a notable example is the data assimilation literature) to statisticians.

In this talk, I will first briefly review common approaches for integrating monitoring data and computer model output, then I will propose an attractive, fully model-based strategy to combine the two sources of data, focusing mostly on the change of support problem with the goal of downscaling the output from numerical models to point level.

I will present the downscaler model in both a univariate and bivariate setting, introducing the models first in a purely spatial setting, and then showing how they can be easily extended to accommodate for the temporal dimension. Using an application on air quality, I will show how our downscaler model, that employs underlying correlated Gaussian processes, provides a better predictive performance than traditional geostatistical techniques and Bayesian Melding (Fuentes and Raftery, 2005). I will conclude by discussing further avenues to extend the approach to incorporate Dirichlet Processes and Markov Random Fields as well as to develop a process-driven spatially-varying weighted downscaler.
 


Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 26th January 2010
11:00am
Computer Science Division University of California, Berkeley
Probabilistic Models of Evolution and Language Change
Show Abstract

Both linguistics and biology face scientific questions that require reconstructing the ancestral forms of discrete sequences from their modern descendants.  In linguistics, these questions are about the words that appeared in the protolanguages from which modern languages evolved.  Linguists painstakingly reconstruct these words by hand using knowledge of the relationships between languages and the plausibility of sound changes.  In biology, analogous questions concern the DNA, RNA, or protein sequences of ancestral organisms.  By reconstructing ancestral sequences and the evolutionary paths between them, biologists can make inferences about the evolution of gene function and the nature of the environment in which they evolved.

 

In this talk, I will give an overview of the main challenges in the field, and show how we addressed two critical difficulties encountered in previous approaches.  The first difficulty comes from the need to fit rate matrices and birth-death parameters of Continuous Time Markov Chains (CTMCs), and obtaining marginals from these CTMCs for different branch lengths.  While these operations can be easily done in pure substitution models, the equivalent task in all but the simplest InDel models is highly non-trivial.  The second difficulty comes from the need to evaluate partition functions and take expectations over the exceedingly large space of evolutionary derivations.

 

I will also present an application to gappy multiple sequence alignment, and a new characterization of sound change obtained from the model.

 

BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 21st January 2010
4:00pm
Department of Statistics Stanford University
A Penalized Matrix Decomposition, with Application to Sparse Clustering
Show Abstract

We present a penalized matrix decomposition, a new framework for computing a low-rank approximation for a matrix. This low-rank approximation is a generalization of the singular value decomposition.  While the singular value decomposition usually yields singular vectors that have no elements that are exactly equal to zero, our new decomposition results in sparse singular vectors. When this decomposition is applied to a data matrix, it can yield interpretable results. Moreover, when applied to a dissimilarity matrix, this leads to a method for sparse hierarchical clustering, which allows for the clustering of a set of observations using an adaptively-chosen subset of the features. These methods are demonstrated on the Netflix data and on a genomic data set.

This is joint work with Robert Tibshirani and Trevor Hastie.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 15th January 2010
10:00am
Department of Statistics Harvard University
Staring at the Black-Box: Statistical Inference in the Physical Sciences
Show Abstract

Many modern statistical applications involve noisy observations of an underlying process that can best 

be described by a complex deterministic system. In fields such as astronomy, astrophysics and the 

environmental sciences, these systems often involve the solution of partial differential equations that 

represent the best available understanding of the physical processes. Statistical computation in this 

context is typically hampered by either look-up tables or expensive “black-box” function evaluations. 

We present an example from astrophysics with a “look-up table likelihood”: the analysis of stellar 

populations. Astrophysicists have developed sophisticated models describing how intrinsic physical 

properties of stars relate to observed photometric data. The mapping between the parameters and the 

data-space cannot be solved analytically and is represented as a series of look-up tables. We present a 

flexible hierarchical model for analyzing stellar populations. Our computational framework is 

applicable to many "black-box" settings, and robust to the structure of the black-box. The performance 

of various sampling schemes will be presented, together with the results for an Astronomical dataset. 

This is joint work with Xiao-Li Meng, Andreas Zezas and Vinay Kashyap.

Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 7th January 2010
11:00am
Epigenics Laboratory, Garvan Bioinformatics Division, Australia
Applied Statistics in Modern Molecular Biology
Show Abstract


Biologists are now collecting more data than ever before, a trend that appears to be accelerating.  For example, DNA microarrays have been in mainstream use for more than a decade and hundreds of thousands of experiments have been conducted.  Due to the reasonably low cost, new platforms and designs are still becoming available.  Recently, second generation sequencing (2GS) platforms, a fundamentally different technology, have become available and are being applied to various biological questions.  Though currently more expensive, 2GS is proving to be more sensitive and is already providing unprecedented quantities of data.  The third generation of sequencing platforms are promising a further explosion of data.  Some statistical analyses of these data are standard, but most are not.  In this talk, I will give an overview of the technologies, data types, applications and a handful of examples from my own research, thus highlighting the role for applied statistics in this field.
 
 
 
 
 

a place of mind, The University of British Columbia

Department of Statistics

Department of Statistics, University of British Columbia
3182 Earth Sciences Building
2207 Main Mall
Vancouver, BC, Canada V6T 1Z4
Tel: 604.822.0570
Fax: 604.822.6960

Emergency Procedures | Accessibility | Contact UBC | © Copyright The University of British Columbia