Seminars
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 24th November 2005
4:00pm
Department of Biostatistics,
University of Iowa.
Nonparametric Inference for Panel Count Data
Show Abstract
We study a simple nonparametric inference procedure for panel count data, a type of complicated data often appeared in clinical trials. We propose an easy-to-implement nonparametric estimation method for the mean function of counting process by maximizing a pseudo-likelihood function established from a non-homogeneous Poisson process. We show, however, that this estimation method is robust in the sense that the method is valid for counting process outside Poisson. We also show that the estimator has a lower than root-n convergence rate and describe the point-wise asymptotic distribution of the estimator. However, these asymptotic properties are not very useful in practice in terms of making inference about the underlying counting process. Finally, we derive the asymptotic normality of a smooth functional of the estimator. This smooth function is easily estimated and hence warrants a useful inference procedure for panel count data.
We further propose a simple nonparametric test for the comparison of the mean functions among k independent samples. The test is validated through various simulation studies and demonstrated by the two real life examples.
BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 10th November 2005
4:00pm
Department of Biostatistics,
Harvard University.
Comparing the joint distribution of multiple categorical variables between two groups: with application to analysis of pre/post HIV-1 genotype sequences
Show Abstract
There are currently 20 FDA-approved antiretroviral drugs for the treatment of HIV-1 infection. Because an infected subject's virus population may develop resistance mutations to one or more drugs, choice of the most effective drug combination is a major challenge facing clinicians treating this disease. Consider a recent AIDS clinical trial where an HIV-1 genotype sequence was recorded for each subject at baseline, and, for those subjects who subsequently failed the study regimen, at the time of virological failure. Consider the scientific problem of identifying changes in the HIV-1 genotype sequence that are associated with the risk of virological failure. To provide a statistically sound solution, a two-stage resampling-based algorithm is proposed that, for arbitrary data-generating mechanisms, asymptotically controls the associated false-positive rate at any pre-specified level. The first stage estimates the components of the joint distribution of HIV-1 genotype sequence that differ between pre/post measurements. The second stage attempts to eliminate from this set of components those variable/level combinations that are not meaningful, in the sense that the difference can be completely explained by pre/post differences with respect to lower-order combinations. In addition to a detailed analysis of this clinical trial, a simulation study is presented that evaluates the methodology for both the paired- and independent-sample case.
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 27th October 2005
4:00pm
Department of Statistics,
Iowa State University.
Two Applications of Nonparametric Regression in Survey Estimation
Show Abstract
We describe two survey estimation areas in which nonparametric techniques can readily be incorporated and improve the precision of estimators. Penalized splines regression, a new and easy-to-use nonparametric method, is used in both cases. In the first area, we will show how model-assisted estimation can be extended from the commonly used linear and ratio models to incorporate nonparametric and semiparametric models. The applicability of the estimator is illustrated using an example from a forest health monitoring survey in Utah (USA). For the second area, we propose a new approach for performing small area estimation, in which the mean function is nonparametrically specified. This is illustrated using data from a ecological health survey of lakes in the Northeastern states of the USA.
Statistics / BRG
BC Cancer Agency
Thu 6th October 2005
3:00pm
Department of Statistics,
UBC.
Topics in Bayesian Statistics (TBA)
Show Abstract
Place:
BC Cancer Agency,
Lecture Theatre,
675 W 10th Ave
Abstract:
Please join us for the second seminar in this series! The UBC Statistics Department and the SFU Statistics and Actuarial Science Department will be jointly hosting one seminar per term at a central location in Vancouver. These seminars are intended to be informal, and at a level accessible to graduate students. The goal is to create a cohesive community of statisticians in the GVRD, and, in particular, to increase the interaction among faculty and students at UBC and SFU. To this end, there will be an intermission (with complimentary refreshments) between talks... and we encourage attendees to take advantage of the opportunity to have dinner together!
For our second event, we have selected two prominent researchers in Bayesian statistics, Professors Paul Gustafson (UBC) and Tim Swartz (SFU). We extend a special invitation to students who are working in this field.
For more information, please contact Rachel Altman (SFU) or Jason Loeppky (UBC). The event is free, but we ask that you register by emailing Jason Loeppky so that we can order refreshments appropriately.
Additional links:
Statistics / BRG
BC Cancer Agency
Thu 6th October 2005
3:00pm
Statistics and Actuarial Science Department,
SFU
Topics in Bayesian Statistics (TBA)
Show Abstract
Place:
BC Cancer Agency,
Lecture Theatre,
675 W 10th Ave,
Vancouver
Abstract:
Please join us for the second seminar in this series! The UBC Statistics Department and the SFU Statistics and Actuarial Science Department will be jointly hosting one seminar per term at a central location in Vancouver. These seminars are intended to be informal, and at a level accessible to graduate students. The goal is to create a cohesive community of statisticians in the GVRD, and, in particular, to increase the interaction among faculty and students at UBC and SFU. To this end, there will be an intermission (with complimentary refreshments) between talks... and we encourage attendees to take advantage of the opportunity to have dinner together!
For our second event, we have selected two prominent researchers in Bayesian statistics, Professors Paul Gustafson (UBC) and Tim Swartz (SFU). We extend a special invitation to students who are working in this field.
For more information, please contact Rachel Altman (SFU) or Jason Loeppky (UBC). The event is free, but we ask that you register by emailing Jason Loeppky so that we can order refreshments appropriately.
Additional links:
Statistics / BRG
UBC Okanagan
Sat 1st October 2005
1:30pm
Department of Mathematics and Statistics,
University of British Columbia Okanagan,
3333 University Way,
Kelowna,
BC
Pacific Northwest Statistics Meeting
Statistics / BRG
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 2nd August 2005
4:00pm
Professor Youngjo Lee
Department of Statistics,
Seoul National University.
Double hierarchical generalized linear models
Show Abstract
We propose a class of double hierarchical generalized linear models (DHGLMs) in which random effects can be specified for various components of models. Heteroscedasticity between clusters can be modelled by introducing random effects in the dispersion model, as is heterogeneity between clusters in the mean model. This class will, among other things, enable models with heavy-tailed distributions to be explored, allowing robust inferences from the GLM class of models, including Poisson and binomial GLMs, and their extension to generalized linear mixed models (GLMMs). The extended likelihood score equations for the new models have bounded influence against outliers. The resulting estimators are robust against outliers, while maintaining high efficiency in the absence of outliers.
In random-effect models there has been a great concern about the choice of distribution, because it is difficult to identify the distribution with limited data, and parameter estimators are vulnerable to the distributional assumptions. Sensitivity in parameter estimation appears substantially worse in nonrandomly ascertained data. We show that such sensitivity can be eliminated through the use of DHGLMs. This modelling approach extends easily to models for survival data, allowing left truncation etc. We also show how it can be used to improve current methods of wavelet estimation.
The h-likelihood provides a unified framework for fitting this new class of models and gives a single algorithm for fitting all members of the class. This algorithm does not require quadrature or prior probabilities.
Note added by host: a reference to an earlier publication is: "Lee, Y., and Nelder, J.A. (2001), Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions, Biometrika, 88 (4), 987-1006"
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 24th May 2005
4:00pm
Xuekui Zhang, MSc Candidate
Department of Statistics,
UBC.
Simultaneous Non-Parametric Constrained Regressions of Unbalanced Longitudinal Data
Show Abstract
Suppose 126 mosquito's wings are collected, and the coordinates of 100 points along the edge of each wing are measured. Our work can be applied to these data to fit functions to describe the shape of the wings, and describe the difference of the shapes of the wings and classify the mosquitos by the shape of their wings.
The aim of our work is to find simultaneous non-parametric regressions of unbalanced longitudinal periodic data. Our work is to modify a method proposed in Philippe C. Besse, Herve Cardot, & Frederic Ferraty (1997) to consider the periodic property of the data.
First we find a way to enforce the periodic property by linear constraints. Then we propose three modifications of the method in Besse et al. In this paper we only apply our methods to longitudinal periodic data, but our methods can deal with longitudinal data with any properties that can be enforced by linear constraints.
We use simulation study to compare the performances of the method in Besse et al and the three methods we propose. Then we apply our best method chosen from our simulation study to three real data sets.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Thu 28th April 2005
4:00pm
Department of Mathematics and Statistics,
McGill University.
Correlation random fields, brain connectivity, and cosmology
Show Abstract
We are all familiar with the correlation coefficient between two sets of numbers. Now suppose we replace the numbers by images in any number of dimensions. The correlation random field is the 'image' of correlations at all possible pairs of points in the two images. We are interested in the topology of the (random) set of high correlations, more specifically, the Euler characteristic (EC). Strangely enough the statistical properties of the EC can be used to detect connectivity in the images, that is, regions of high correlation. We apply this idea to resting state fMRI images of brain activity, brain damage due to non-missile trauma, and to connections between MS lesions and cortical thickness. The same methods are used in cosmology to look for large-scale structure in the universe, and anomalies in the cosmic microwave background from the big bang. We apply this to the latest results from the Sloan Digital Sky Survey and the Wilkinson Microwave Anisotropy Probe.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 26th April 2005
11:00am
Dr. Jiahua Chen
Department of Statistics and Actuarial Science,
University of Waterloo.
The universal validity of the possible triangle constraint of the Affected-Sib-Pairs
Show Abstract
In Affected-Sib-Pair analysis, genetic marker data are collected from families with at least two sibs affected by a disease under investigation. At any locus not linked to the disease gene, a sib pair shares 0, 1 or 2 alleles identical by descent (IBD) with probabilities of 0.25, 0.5 and 0.25 respectively. With linkage, the IBD value increases stochastically. Holmans (1993) discovers that the IBD distribution at the single disease locus of an affected-sib-pair satisfies a `possible triangle constraint', and at markers linked to the single disease locus, the possible triangle constraint remains if the disease is Mendelian and the male and female recombination rates for are the same. This result makes it possible to sharpen the statistical procedures to achieve better power in detecting linkage. It is of statistical and genetical importance to investigate whether the possible triangle constraint remains true in general regardless of disease types and recombination models. In this presentation, we show that the IBD distribution at any marker satisfies the possible triangle constraint, under the Hardy-Weinberg equilibrium and no-interference in crossover assumptions. We will also discuss the issue on the limiting distribution of the likelihood ratio test statistic under the possible triangle constraint.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 19th April 2005
4:00pm
Department of Statistics,
Ohio State University.
A Variance-bias tradeoff for model list selection
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 12th April 2005
4:00pm
Department of Statistics,
UBC.
Sample Size and Effective Samples
Show Abstract
We distinguish between two classes of sample size problems. The first is the actual sample size needed to achieve a specific inference goal. The second is an effective sample where we try to interpret one sample under one model as another sample under a different model. An effective sample leads to an effective sample size which may be more important than the sample itself.
For the actual case, we give asymptotic expressions for the expected value, under a fixed parameter, for certain types of functionals of the posterior density in a Bayesian analysis. The generality of our approach permits us to choose functionals that encapsulate different inference criteria. The dependence of our expressions on the sample size means that we can pre-experimentally obtain adequate sample sizes to get inferences with a pre-specified level of accuracy.
For the effective sample case, we find a virtualsample under one model that gives the same inferences as an actual sample under another model. We use the same prior for both models, but this is not necessary. We show these effective samples exist and give some examples to show that their behavior is consistent with statistical intuition and the procedure can be extended to give a notion of effective number of parameters as well.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 22nd February 2005
4:00pm
Department of Mathematics and Statistics,
Dalhousie University.
Variable Selection both Practical and Conceptual
Show Abstract
I will first look at issues of variable selection as they arise in longitudinal data analysis. In ongoing work with Eva Cantoni, Joanna Flemming and Elvezio Ronchetti, have developed variable selection procedures based on estimated prediction error. Our computations use GEE estimates but other settings are possible. To compute the prediction error, we use the ideas of cross validation where the size of the prediction sample grows as the number of experimental units increases. To handle the cases where the number of variables is large, we use MCMC ideas as developed by Guoqi Qian and me to move through the model space. The procedure not only gives an estimate of the best model but will also give all models whose prediction error is within one standard error of the chosen model, as in the spirit of bagging. Will also briefly discuss some unifying ideas of model selection using the Kullback-Leibler information. Our aim to have a procedure which works when the true model lies outside the class of models within which we are doing the selection. This is based on ongoing work with Guoqi Qian.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Mon 14th February 2005
10:00am
Dr. Ryan Elmore
Mathematical Sciences Institute,
The Australian National University.
A fully nonparametric test for one-way layouts
Show Abstract
We consider the use of a vector of sign statistics as the basis of a nonparametric test for equality of distributions in one-way layouts. An important feature of this test is its ability to detect a broad range of alternatives, including scale and shape differences. In this scenario, the data consist of several independent measurements on each treatment or subject. We will present finite sample and asymptotic distribution theory for our test statistics and discuss a follow-up multiple comparisons procedure based on a mixture model approximation. Finally, the methods are illustrated using two examples from the literature.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 11th February 2005
10:00am
Geophysical Statistics Project,
The National Center for Atmospheric Research, Boulder, Colorado.
Design and analysis of computer experiments (DACE): some geophysical applications
Show Abstract
Computer experiments are increasingly used in scientific investigations as substitutes for physical experiments in cases where the later are difficult or impossible to perform. A computer experiment consists of several runs of a computer model for the purpose of better understanding the input-output relationship. The practical difficulty in some situations is that a single computer model run may use a prohibitive amount of computational resources. The DACE approach proposes to use statistical models as less expensive surrogates for such computer models; these provide both point predictors and uncertainty characterization of the outputs. The first part of this talk describes a two-stage statistical method for computer experiments which produce multivariate output on a spatio-temporal grid with large time dimension. An ocean model will be used to illustrate the method. The second part of the talk shows preliminary results of a DACE application to the problem of fitting observed climate data to a climate model.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 9th February 2005
10:00am
Alessandro Rinaldo, Ph.D candidate
Department of Statistics,
Carnegie Mellon University.
Maximum Likelihood Estimates for Large Sparse Contingency Tables
Show Abstract
Log-Linear models are a powerful statistical tool for the analysis of categorical data. Their use has increased greatly over the past two decades with the compilation and distribution of large sparse data bases, in the social and medical sciences as well as in machine learning applications. Such data bases often take the form of high-dimensional contingency tables with a large number of empty cells. In such situation the Maximum Likelihood Estimate (MLE) of the cell mean vector is very likely to be undefined. The existence of the MLE is crucial for assessment of fit, for model selection and for interpretation. However, available results in the statistical literature do not lead directly to implementable numerical procedures, nor do they offer alternative methods of inference. Recent advances in Algebraic Statistics have suggested a more general approach to the study of Log-Linear models that takes advantage of the connections between Algebraic and Polyhedral Geometry and the theory of exponential families. In this talk I will describe geometric and combinatorial conditions for the existence of the MLE. I will show that, within the Log-Linear model framework, the set of cell mean vectors consists of points satisfying polynomial equations. I will use this characterization to define the Extended MLE. I will illustrate how the Extended MLE can be used to perform model selection and briefly comment on some computational aspects associated to its derivation.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Tue 8th February 2005
4:00pm
Dr. Pat Farrell
School of Mathematics and Statistics,
Carleton University.
A Nonlinear Conditional Probability Model for Non-Stationary Longitudinal Binary Data
Show Abstract
In many biomedical studies, we often collect data on each of many individuals over time that comprise repeated binary responses and an associated set of multidimensional covariates. When the covariates collected along with the binary responses are time dependent, the responses of an individual exhibit non-stationary longitudinal correlations. We exploit here a dynamic logistic model to fit the non- stationary longitudinal binary data, which unlike the existing correlated binary models, allows full ranges for the correlation parameters. The regression and dependence parameters in the model are estimated by using a traditional generalized quasilikelihood (GQL) and an improved GQL (IGQL) estimation approach. We illustrate in a simulation study that estimators of the model parameters based on the IGQL approach are significantly more efficient that GQL counterparts. In addition, the proposed IGQL approach yields the same estimates as the exact maximum likelihood (ML) approach, is easier computationally, and lends itself more readily to extensions of the proposed model; for example, if a dynamic logistic mixed model were to be considered instead. We also present an application of the proposed model and estimation procedures to the analysis of longitudinal binary data on wheezing status in children.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Fri 4th February 2005
10:00am
Leilei Zeng, Ph.D candidate
Department of Statistics & Actuarial Science,
University of Waterloo.
Issues of Model Misspecification in the Analysis of Incomplete Longitudinal Data
Show Abstract
A variety of methods have been proposed in the recent literature for dealing with incomplete longitudinal data, several of which require modeling of the mechanism leading to incomplete data, sometimes called the missing data process. The validity of an estimating procedure for regression parameters depends upon all model assumptions, including those for missing data process. This raises questions regarding the impact of model misspecification for the missing data process on inferences regarding regression parameters. This talk will begin with an investigation which revealed poor frequency properties of estimators and tests of treatment effect based on a widely used imputation strategy called "last observation carried forward" (LOCF). Methods based on inverse probability weighted estimating equations will be reviewed for longitudinal data and it will be demonstrated how consistent parameter estimates result from the analysis under correct models for the missing data process with "random drop-outs" (RD). Such methods are sensitive to misspecification of the model for the drop-out process. I will then demonstrate the impact of misspecification of drop-out model on the asymptotic and finite sample frequency properties of regression coefficients in the response model. Particular attention is given to two cases of model misspecification: the case that important covariates are omitted and, the case that there are competing reasons for drop-outs but a single cause drop-out process is assumed at the analysis stage.
Statistics
Leonard S. Klinck 301, 6356 Agricultural Road, UBC
Wed 2nd February 2005
2:30pm
Department of Statistics,
University of Washington.
Bayesian Robust Inference for Differential Gene Expression
Show Abstract
In this talk, I will consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allow different variances for the genes but still shrink extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method will be illustrated using a publicly available gene expression data set. We will compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, Significance Analysis of Microarrays (SAM), Efron's empirical Bayes, and EBarrays in both its Lognormal-Normal and Gamma-Gamma forms. Our method performs better than these alternatives, on the basis of between-replicate agreement and disagreement.
Joint with Adrian E. Raftery, Ka Yee Yeung and Roger Bumgarner.
Statistics / BRG
Sauder Industries Policy Room (#2270)
Thu 27th January 2005
3:00pm
Department of Statistics and Actuarial Science,
SFU.
Problems in the Analysis of Spatial Longitudinal Data
Show Abstract
Sauder Industries Policy Room (#2270)
SFU Harbour Centre 515 West Hastings Street
Vancouver, BC V6B 5K3
Multi-state models can be useful in longitudinal studies where at any point in time, an individual may be said to occupy one of a discrete set of states and interest centers on determining what influences transitions between states. For example, states may refer to the number of recurrences of an event, or the stages of a disease. Statistical methodology for the analysis of this type of longitudinal data is presented with the added features of examining how the rates of transitions over states differ spatially over a region. Spatial random effects are also considered in a special case: the two-state mover stayer model. This talk will be an informal discussion of the challenges of such analyses and will outline work with students Farouk Nathoo, Jason Nielsen and Laurie Ainsworth on recently developed methods for handling such problems.
Statistics / BRG
Sauder Industries Policy Room (#2270), SFU Harbour Centre
Thu 27th January 2005
1:30pm
Department of Statistics,
UBC.
Physical vs Statistical Modelling: Towards Reconciliation
Show Abstract
The cultures of physical and statistical modellers differ greatly. However, a search for reconciliation has begun, driven by the practical requirements of handling processes over very large space-time domains, and the risks attached to them. My talk derives from the experience of me and my UBC co-researchers, Nhu Le, Yiping Dou, and Zhong Liu with much input from Douw Steyn, an atmospheric scientist. We have been examining hourly ground level ozone concentrations over a very large part of the eastern USA. In particular, we have been seeing how to reconcile simulated data from MAQSIP, a very large deterministic model for that field, and data from about 300 sites. The data were produced over about 120 days in a single summer. I will describe our approaches and some of the results. However, much of the discussion will be devoted to more fundamental issues.