Department Seminars 2002
| DATE/PLACE: |
Tuesday, December 3, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | ANALYSIS AND MODELLING RECENT CLIMATE DRIVEN CHANGES IN STREAMFLOW IN BRITISH COLUMBIA AND YUKON |
| SPEAKER: |
Paul H. Whitfield Manager, Science Division Meteorological Service of Canada Environment Canada |
| During recent decades variations in climate conditions have occurred coincident to significant changes in streamflow in British Columbia. Modelling these variations provides an insight into how rivers and streams might behave in a changed climate. We investigate the ability of empirical downscaling models to resolve these changes using ensemble neural networks forced with large-scale atmospheric circulation conditions from the NCEP/NCAR atmospheric model reanalysis project. Five-day average streamflow data from British Columbia and the southern Yukon are predicted using atmospheric circulation and moisture fields from 1965-1986 as model inputs. Ability of the models to predict streamflow during the 1987-1998 test period is then evaluated using a combination of model performance statistics, comparisons between long-term averages, and results from non-parametric statistical tests. Correspondence between modelled and observed changes in long-term average streamflow is assessed using results from a recent study of regionalization of hydrologic change in Canada. In particular, the ability of the models to capture various aspects of the hydrologic regime in the different watersheds is demonstrated. | |
| DATE/PLACE: |
Thursday, November 28, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Joint Workshop / Biostats Research Group |
| TITLE: |
Accounting for Time-Activity Patterns in Human Exposures to Air Pollution: A Computer Modelling Approach. |
| SPEAKER: |
Jim Zidek Department of Statistics, UBC |
|
Based on work with a number of co-investigators, notably Jean Meloche, Chris Chatfield, Gavin Shaddick and Rick White, I will present a general framework for estimating human exposure to environmental hazards, focusing particularly on air pollution, the topic of much of the research by me and my co-investigators at University's of Bath and BC. Although the approach must be adapted for use in particular contexts, it does provide a blueprint. Moreover, general inferential procedures usable in particular contexts can be given. The modules for that model can to some extent be developed independently of one another. I will emphasize a particularly important one, a large computer model that accounts for the random behavior of individuals. That model, an adaption of the well-known pNEM model developed by the EPA of the USA, accounts for behaviour by sampling from 24-hour recall time-activity databases. On approval, users can access the model through their Internet web-browsers. They can then construct their own versions of this model for any environmental hazard, online, by specifying its parameters and uploading to the UBC site, any relevant datasets. I will show some outputs from that model and how it may be used by regulators in scenario analysis. I will also describe how it can be used in connection with other modules, for explain one that enables the local levels of a hazard to be predicted using data from remote monitoring sites. Application to the estimation of personal exposures to London's and Vancouver's PM10 field will be discussed. Implications of that work for health risk assessment will be indicated.
|
|
| DATE/PLACE: |
Thursday, November 21, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Biostats Research Group |
| TITLE: |
Visual Perception and its Relationship to Graphical Methods |
| SPEAKER: |
Anona Thorne Canadian HIV Trials Network |
|
The workings of human visual perception play a strong role in the interpretation of graphs. Consideration of the mechanisms of perception should therefore play a more prominent role in their design than is often the case in current practice. This talk will explore the impact on graphics of a number of perceptual features and suggest more effective alternatives to some common graphical techniques. In each case, examples of different options for plotting the same data will be examined and compared.
|
|
| DATE/PLACE: |
Tuesday, November 19, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Model Discrimination Criterion for Finding the Active Factors in Screening Experiments |
| SPEAKER: |
Tanya Kolosova, Ph.D student Department of Statistics Tel-Aviv University, ISRAEL |
| Screening designs are often used in industry to study a large number of factors in a relatively small number of runs. A screening experiment is typically carried out to identify the probable active factors. One problem with screening designs is that they do not lead to unequivocal conclusions. In such cases, additional runs are required. These runs should be selected in such a way that they help to resolve the ambiguity. Within the Bayesian construct, R.D. Meyer, D.M. Steinberg and G.E.P. Box have developed a method for designing a follow-up experiment to resolve this ambiguity. The idea is to choose runs that allow maximum discrimination among the plausible models. This method was implemented by Meyer et al. for models with linear main effects and interactions up to any desired order. In continuation of the works by Meyer et al., I'm researching features of this method, including sensitivity to a starting design. It appears that non-regular designs with complex aliasing have better resolving capabilities. I'm also developing extension of this methodology for models with higher-order main effects and interactions of any desired order. | |
| DATE/PLACE: |
Thursday, November 14, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Biostats Research Group |
| TITLE: |
Ecological Inference and Sensitivity Analyses for Geographical Correlation Studies |
| SPEAKER: |
Jon Wakefield Dept of Statistics/Biostatistics, University of Washington |
|
In this talk I will discuss geographical correlation studies in which area-level (aggregate) data are used to investigate the association between disease risk and (often) environmental factors that may be in air, water or soil. Ecological studies are appealing as they may use routinely-available data, and can produce exposure contrasts that are much larger than those in individual-level studies. Unfortunately estimates from such studies are subject to a variety of biases that are not present in individual-level studies. These biases arise from within-area variability in exposures and confounders and can lead to the ecological fallacy. I will discuss a number of methods for both controlling for different forms of the bias, and for assessing the sensitivity to bias. Finally I will discuss the role of spatial models for the residuals, and will argue that elaborating such models will often not be merited in the context of ecological regression.
|
|
| DATE/PLACE: |
Tuesday, November 12, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Globally robust inference for simple linear regression models with the repeated median slope estimate. |
| SPEAKER: |
Jafar Khan Department of Statistics, UBC |
|
Globally robust inference takes into account the asymptotic bias of the point estimates (Adrover, Salibian-Barrera and Zamar, 2002). To construct robust confidence intervals for the simple linear regression slope, the authors selected the generalized median of slopes (GMS) as their point estimate, considering its good bias behavior and asymptotic normality. However, GMS has a breakdown point of only 0.25, its asymptotic normality is established under very restrictive conditions, and its bias bound is known only for symmetric carrier distributions. In this study, we propose the repeated median slope (RMS) estimate as an alternative choice. RMS has a breakdown point of 0.50, its asymptotic normality holds under mild assumptions, and the bias bound for RMS is known for general carrier distributions. The proposed method achieves, more or less, the same observed coverage levels while it constructs intervals of smaller lengths, as compared to the GMS approach. |
|
| DATE/PLACE: |
Friday, November 8, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Risk Analysis in Geoscience and Remote Sensing |
| SPEAKER: |
Prof. David R. Brillinger Statistics Department University of California Berkeley, CA |
| Risk analysis, that is the problem of estimating the probabilities of rare and damaging events, unifies the geosciences. One can mention the risks from: floods, earthquakes, forest fires, space debris. The probabilities may be fed into the computaion of insurance premiums. The Poisson process often plays a prominent role, while in the talk marked point processes will have a basic function. Various ways to collect and extrapolate data will be described and examples from various fields will be presented. | |
| DATE/PLACE: |
Thursday, November 7, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Biostats Research Group |
| TITLE: |
Modelling Count Data Time Series |
| SPEAKER: |
Rong Zhu Departments of Statistics UBC |
|
Count data time series arise from various dynamic phenomena such as daily counts of children with respiratory symptoms or migrating birds, monthly insurance claimant numbers, or lesion numbers of a MS patient observed at roughly every six weeks, etc. These time series may be observed at equally-spaced or unequally-spaced time points. Modelling such count data time series requires appropriate discrete-time or continuous-time stochastic processes as the probabilistic framework to describe the auto-dependence and marginal features. In this talk, we particularly choose the well-known continuous-time Markov processes based on binomial thinning to be the stochastic model. A convenient algorithm is given for computing the conditional probability mass function. This new result is crucial to maximum likelihood estimation. We apply such a kind of models to the WCB claims data to illustrates the main idea.
|
|
| DATE/PLACE: |
postponed Tuesday, November 5, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Joint Workshop / Biostats Research Group |
| TITLE: |
TBA |
| SPEAKER: |
Rachel MacKay Department of Statistics UBC |
| DATE/PLACE: |
Tuesday, October 29, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Mixtures-of-Experts of Time Series Models |
| SPEAKER: |
Alex Carvalho Department of Statistics UBC |
|
We consider novel class of non-linear models based on mixtures of local time series. In these models, at any given time point t, we have a number J of regression models, denoted experts, and a latent indicator variable, whose distribution may depend on the same covariates as the experts, that determines which regression model is observed. These models are referred to as mixtures-of-experts of time series, and they bear some similarity to many other non-linear models in the literature, such as the threshold autoregressive (TAR) models and the Bayesian treed models. For the experts, we considered normal, Poisson, binomial, gamma and generalized-t (heavy tail) regressions. For a fixed number J of components, we present the asymptotic properties of the maximum likelihood estimator for mixtures-of-experts of time series, under the assumptions of correctly and incorrectly specified models. We also provide sufficient conditions to guarantee stochastic stability (uniform ergodicity). Assuming the model is correctly specified, Monte Carlo simulations suggested that the BIC provides a consistent estimator for the true number of experts J. One of the advantages of mixtures-of-experts is their ability of modeling general conditional densities by increasing the number of mixed components J. Finally we illustrate the performance of mixture-of-experts in density forecast for simulated and real financial data. |
|
| DATE/PLACE: |
Thursday, October 24, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Joint Workshop / Biostats Research Group |
| TITLE: |
Integrating Interdisciplinary Information for Innovative Investigation, Interpretation, Inspection, and Querying Engine [IQ Engine] |
| SPEAKER: |
Janet McManus, St. Paul's Hospital/UBC |
|
The McDonald Research Laboratories (MRL) / iCAPTUR4E Centre is a UBC research laboratory located at the St. Paul's Hospital campus. It is a publicly funded research organization that uses advanced methods and technology to formulate, design, execute, analyze, synthesize, and report experimental findings regarding the causes, mechanisms, outcomes and implications of circulatory and respiratory diseases. Similarly, IBM Life Sciences is dedicated to rapidly bringing leading-edge technology out of the laboratory and into the marketplace for customers and Business Partners in the fields of pharmaceutical research, biotechnology, genomics, proteomics, health care, and academic research. It is these parallel missions that have led to the IQ Engine research collaboration between IBM Life Sciences and the MRL/iCAPTUR4E Centre aimed at producing a powerful query engine for data mining and innovation discovery. This IQ Engine enables the Centre to explore and understand the interrelationships between genomic and phenotypic data in relation to heart, lung and blood vessel disease. The Pilot Project will use a narrow subsection of data with a view to expanding it to query MRL/iCAPTUR4E's complete data framework. The Pilot Project is focused on control or rheumatic human heart valve datasets including patient information (clinical and demographic), echocardiography and gross pathology. Datasets on the valve tissue include immunohistochemical quantitation of valve structural components, microarray profiles, and proteome profiles. Datasets on cultured myofibroblasts from the valve tissue include microarray profiles, proteome profiles, immunohistochemical quantitation, calcium imaging and kinase signalling data. With these datasets, the Pilot Project attempts to answer two specific research questions: (1) Is there any correlation between the expression of a certain gene/pathway and Ca2+ regulation in valve myofibroblast cells? (2) What are the phenotypic, transcriptomic and proteomic differences between control heart valves and diseased heart valves? The implementation of the Pilot Project will combine existing IBM program products, third party products, IBM Research code, and custom written code to create a search tool to uncover relationships in the selected input data. Statistical analysis is performed by SAS. Display of textual data and visualization of data trends is performed using Spotfire's DecisionSite as a means of visually displaying and exploring the query results. The IQ Engine is an integration infrastructure for various data sources coupled with a flexible and general purpose set of query, display, analysis, and visualization capabilities. This solution will enable the MRL/iCAPTUR4E Centre to search for new pathogenesis mechanisms within diverse datasets and allows for new research synergies across our organization.
|
|
| DATE/PLACE: |
Thursday, October 10, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Biostats Research Group |
| TITLE: |
Bent-Cable Regression -- Why Fish Don't Like Kinks |
| SPEAKER: |
Grace Chiu Department of Statistics SFU |
|
A method commonly used for estimating the onset of change involves fitting a sharply kinked line, sometimes called a broken stick, to a graph of the response and explanatory variables. For example, consider the relationship of abundance versus time for a declining fish population. A fisheries manager would commonly use the estimated date of onset based on a broken-stick fit as a clue to the actual cause of the decline. However, researchers in this and other fields are often tempted to conclude abruptness even when there is seldom solid theory to justify such claims. To address this issue, we use what we call the bent-cable model whose quadratic bend of non-negative width generalizes the kink of a broken stick. Part 1 of this talk features worked examples of bent-cable regression for assessing abruptness of change. They demonstrate that data for typical biological phenomena are far too imprecise to support the abruptness notion associated with the broken-stick model. Part 2 briefly discusses the irregularity intrinsic to bent-cable regression, and the practical design conditions that yield regular asymptotics for the estimation problem.
|
|
| DATE/PLACE: |
Thursday, 3 October 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Flexible Modeling of High Throughput Screening Data and Model Assessment |
| SPEAKER: |
Yuanyuan (Marcia) Wang Department of Statistics & Actuarial Science University of Waterloo |
| High Throughput Screening (HTS) is used in drug discovery to screen large numbers of compounds against a biological target. Data on activity against the target are collected for a representative sample (experimental design) of compounds selected from a collection. The explanatory variables are chemical descriptors of compound structure. Some previous work shows that local methods, namely K-nearest neighbors (KNN) and classification and regression trees (CART), perform very well. Some adaptations to KNN and CART including averaging over subsets of explanatory variables, bagging, and boosting, have also been considered. After briefly reviewing and comparing these techniques, I will focus on estimating activity and error rates for assessing model performance. This will shed some light on how various models handle large random or systematic errors in drug screening data. This is joint work with Dr. Hugh Chipman and Dr. William Welch at the University of Waterloo. | |
| DATE/PLACE: |
Tuesday, September 26, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Biostats Research Group |
| TITLE: |
Investigating a Robust Variance-Components Approach for Linkage Analysis in Complex Traits |
| SPEAKER: |
Lisa Kuramoto Department of Statistics UBC |
| Model-based linkage methods have had limited success in locating quantitative trait loci (QTLs) in complex traits since the underlying genetic mechanisms are not well known. As a result, robust approaches for detecting linkage have grown in popularity. We discuss a mixed effects model, which involves the estimation of genetic and non-genetic variance components, as well as recombination fractions. Using the Genometric Analysis Simulation Program (GASP), we first attempt to investigate the properties of this method on simple traits, which differ in terms of their variance components. To further understand its performance in a complex setting, we apply this method to simulated, familial data for an oligogenic disease with quantitative risk factors from the 10th Genetic Analysis Workshop (GAW10). We see that the ability of the variance-components approach to map QTLs depends on the amount of variability it contributes to the quantitative trait. As well, we find that the presence of the recombination fraction in the model results in similar estimates of the variance components across the chromosome; however, it does not seem to improve the mapping ability of the model | |
| DATE/PLACE: |
Tuesday, July 23, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: |
1. A nonlinear approach to the linear calibration problem; and 2. Gini - an alternative measure of variability |
| SPEAKER: |
Dr. Edna Schechtman Dept. of Industrial Engineering & Management Ben Gurion University |
|
Calibration typically relates a standard unit x to an instrumental measurement Y by using data x_i and Y_i collected from a calibration experiment to estimate the calibration curve. After the calibration curve is estimated, new measurements Y^* are taken and the calibration curve is then used with the Y^* to estimate the associated standard values, by point and interval estimates. The standard or classical formula was derived by Eisenhardt (1939). While his formula is useful in many instances, there are other cases where either it is not applicable or it gives too wide of an interval. In this talk, a different approach is suggested. It is based on reparametrizing the linear model so that the unknown value of the standard becomes a parameter in the nonlinear regression model. Then, standard packages for nonlinear regression can easily provide a point estimator, as well as its standard error, which can then be modified and used as the basis for the confidence interval. The intervals are centered at the point estimator (as opposed to Eisenhardt's intervals), and it can be shown that they are shorter, for a reasonable coverage rate. We show, via simulation, that the major advantage of the proposed method is when the measurement errors are moderately big. Gini Mean Difference (GMD) is an alternative measure of variability, which is more robust to extreme values, and is used in the area of income distributions. There are more than a dozen ways to write Gini. One of them is as the covariance between the variable and its cumulative distribution. This presentation of the Gini index opens a wide area of research, leading to definitions of Gini covariance, Gini correlation, Gini regression, etc. In this brief talk I'll introduce some of the indices and define a new family of (extended) Gini correlations |
|
| DATE/PLACE: |
Thursday, July 18, 2002, 14:15 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | High Throughput Screening Data for Drug Discovery |
| SPEAKER: |
Dr. William Welch Department of Statistics and Actuarial Science University of Waterloo |
|
This talk will overview statistical methods and issues arising in drug discovery, specifically the collection and analysis of high throughput screening (HTS) data. Statistical analysis of HTS data is aimed at predicting activity of chemical compounds (potential drugs) against a given biological target. The explanatory variables used are chemical descriptors of compound structure. Data sets tend to have large numbers of observations, and there are many potential explanatory variables. Some work comparing statistical analysis methods on several HTS data sets shows that local methods, namely K-nearest neighbours (KNN) and classification and regression trees (CART), perform well. Methods like CART have to be used with the objectives of HTS in mind, however. For example, tree pruning to avoid over-fitting seems unnecessary and may even be harmful to identifying activecompounds. I shall also describe some adaptations to KNN and CART that are based on model averaging. This is joint work with Hugh Chipman and Marcia Wang at the University of Waterloo and with Raymond Lam and Stan Young of GlaxoSmithKline.
|
|
| DATE/PLACE: |
Thursday, July 11, 2002, 13:45 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | A Marginal Ergodic Theorem |
| SPEAKER: |
Dr. Michael Lavine Institute of Statistics & Decision Sciences Duke University |
| In recent years there have been several papers giving examples of Markov Chain Monte Carlo (MCMC) algorithms whose invariant measures are improper (have infinite mass) and which therefore are not positive recurrent, yet which have subchains from which valid inference can be derived. These are nonergodic (not having a limiting distribution) Markov chains (MC's) that can be written, possibly after transformation, as Z = {Z(n); n >= 0} = {(X(n),Y(n)); n >= 0} for which the subchain X(n) is ergodic (has a limiting distribution). This paper gives a marginal ergodic theorem which (a) gives a formula for bounding the liminf and limsup as n --> infinity of the distribution of X(n) and (b) often allows for direct calculation of the limiting distribution, should one exist. | |
| DATE/PLACE: |
Tuesday, July 9, 2002, 14:15 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Statistical inference for compositional processes |
| SPEAKER: |
Dr. Peter Guttorp National Research Center for Statistics & the Environment University of Washington |
| Compositional data arise in a variety of environmental and ecological situations. For example, biological monitoring of water quality can be done by studying the composition of the population of various species. We develop a framework for compositional data analysis and outline how standard models in statistics translate to the compositional framework. Three examples are studied in some detail: an ecological hypothesis is tested in an analysis of variance framework; a particulate matter air pollution series is used to distinguish sources; and data from a biological monitoring program are modeled in space and time. | |
| DATE/PLACE: |
Friday, June 28, 2002, 14:15 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Modelling and Analyzing Spatial-Temporal Environmental Data |
| SPEAKER: |
Dr. Abdel El-Shaarawi National Water Research Institute, Burlington, Ont. and Dept. of Mathematics & Statistics, McMaster University |
|
In recent years more effort has been directed to the development of statistical methods for the detection, estimation and prediction of environmental changes. The aim is to generate efficient information for use in environmental management. In this talk an overview will be presented and areas where further research is needed will be emphasized. Environmental data are routinely collected from a fixed set of locations within an ecosystem and comprise of time series of discrete and continuous measurements. The interest is to model trend and seasonality at each location and to combine the location specific models into an overall model for making inferences about the entire ecosystem. The results of using kernel smoothing, regression models with dependent error processes and quasi-likelihood methods to investigate the spatial and temporal structure of several water and air quality data will be discussed. |
|
| DATE/PLACE: |
Tuesday, June 25, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Imputation Methods for Missing Covariates in Nonlinear Mixed-effects Models, with Application in AIDS Studies |
| SPEAKER: |
Eishita Ali Department of Statistics University of British Columbia |
| Missing data is a frequently encountered problem in practice, especially in longitudinal studies. Commonly used ad hoc methods include the complete-case method, the mean-value imputation method, the last-value-carried-forward method, and simple interpolation methods. However, it is known that these simple methods may produce misleading results. In this project, we consider missing covariates problems in nonlinear mixed-effects (NLME) models for longitudinal data, and propose three new imputation methods: an interpolation method, a model-based imputation method and a multiple imputation method. We compared the proposed methods with the commonly used methods via simulations and find that the proposed methods are better than the commonly used methods in the sense that they have smaller biases and MSEs. We applied these methods to a real dataset in AIDS studies. | |
| DATE/PLACE: |
Tuesday, April 16, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Calculating information matrices for finite mixture models |
| SPEAKER: |
Dr. Murray Jorgensen Visitor: Mathematics & Statistics University of Victoria On leave from: Department of Statistics University ofWaikato New Zealand |
| The problem of the efficient calculation of the observed information matrix after fitting a finite mixture model by the EM algorithm is considered. A recommended approach is described and illustrated by an example. | |
| DATE/PLACE: |
Tuesday, March 26, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Nonparametric Bayes approaches to infer mixing distributions |
| SPEAKER: |
Dr. Michael Newton Departments of Statistics and of Biostatistics and Medical Informatics University of Wisconsin-Madison |
|
Routinely in statistical applications, hierarchical models arise in which unobserved random effects contribute to heterogeneity amongst sampling units. An easily computable, smooth nonparametric estimate of the underlying mixing distribution can be derived as an approximate nonparametric Bayes estimate under a Dirichlet process prior. I will discuss the recursive estimation algorithm, its consistency properties, and its application in several examples, including its use in the analysis of microarray gene expression data.
|
|
| DATE/PLACE: |
Tuesday, March 12, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | IDENTIFICATION OF GENES EXPRESSED IN EARLY-STAGE LUNG CANCERS |
| SPEAKER: |
Dr. Steven Jones Head of Bioinformatics Genome Sequence Center at the British Columbia Cancer Agency |
|
We have established a multi-disciplinary approach to the large-scale high-throughput identification of genes involved in early stage cancers, under the auspices of Genome Canada. As part of this study we will be analyzing 166 Serial Analysis of Gene Expression (SAGE) libraries to profile mRNA expression in approximately 20 different tissue types. One particular focus has been the development of gene expression profiles for early stage lung cancer. By utilizing novel bronchoscopy techniques we have been able to obtain tissue samples from lung carcinoma in-situ. Although only small amounts of tissue are obtained, these are sufficient for the preparation of SAGE and micro-SAGE libraries. By sequencing the SAGE libraries we have been able to generate comprehensive and deep expression profiles for lung cancer. Ten libraries have been analyzed to date, five each for both cancerous and normal samples. We have identified, 14,624 different transcripts in these data. Eighty-four of these genes are consistently up-regulated within the lung cancer samples (58 of which are lung cancer specific and not observed in normal lung) and 122 genes have been identified as being consistently down-regulated in these early-stage lung tumors. To facilitate analysis of our SAGE data we have developed an expression visualization tool, SageSpace and an analysis database, SageDB. Using these tools, we have been able to assess the heterogeneity between lung cancer samples. Through orthogonal comparison to other publicly available SAGE and EST data we have also been able to assess the representation of these lung cancer associated genes in other adult tissues. Our analysis indicates that four of the transcripts found to be lung cancer specific show expression in only one normal adult tissue. This approach has also allowed us to compare lung cancer expression with expression data from other cancers and other diseases. This information will be useful in determining the potential of existing drug therapies for application in lung cancer treatment. The SageDB system links expression data with a number of biological databases, including PFAM, SwissProt, OMIM, dbEST, BIND and KEGG. This functionality allows us to recreate expression profiles for specific biological pathways, e.g. apoptosis, allowing heterogeneity to be assessed in a pathway specific manner, the results being graphically visualized using the SageSpace viewer.
|
|
| DATE/PLACE: |
Tuesday, February 5, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Mixtures-of-Experts of Generalized Linear Time Series |
| SPEAKER: |
Alexandre Carvalho Northwestern University |
| We consider a novel class of non-linear models based on mixtures of local generalized linear time series. In our construction, at any given time, we have a certain number of generalized linear models (GLM), denoted experts, where the vector of covariates may include functions of lags of the dependent variable. Additionally, we have a latent variable, whose distribution depends on the same covariates as the experts, that determines which GLM is observed. This structure is considerably flexible, as was shown by Jiang and Tanner in a series of papers for mixtures of GLM with independent observations. For parameter estimation, we show that maximum likelihood (ML) provides consistent and asymptotically normal estimators under certain regularity conditions. We perform some Monte Carlo simulations to study the properties of the ML estimators for finite samples. Finally, we apply the proposed models to study some real examples of time series in Marketing and Finance. | |
| DATE/PLACE: |
Tuesday, January 29, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TYPE: | Joint Workshop / Biostats Research Group |
| TITLE: |
Empirical likelihood methods for comparison of survival functions |
| SPEAKER: |
Yichuan Zhao Florida State University |
|
The use of empirical likelihood in survival analysis was initiated by Thomas and Grunkemeier (1975) who derived pointwise confidence intervals for the survival function. Since the breakthrough work of Owen (1988, 1990) the method has been applied to a variety of statistical problems. The goal of our research is to develop the approach for the comparison of survival functions for k-sample problems in survival analysis. We derive an empirical likelihood simultaneous confidence band for the ratio of two survival functions based on independent right-censored data. Earlier authors have studied such bands for the difference of two survival functions, but the ratio provides a more appropriate comparison in some applications, e.g., in comparing two treatments in biomedical settings. Our approach also works for the difference of two cumulative hazard functions. A test for equality of corresponding hazard functions is also constructed, and consistency against any fixed alternative is established. We develop a Monte Carlo simulation method to approximate the null distribution of the test statistic. Cumulative hazard ratios appear to be more tractable than ratios of survival functions or differences of cumulative hazard functions in the k-sample setting. However, the band for the ratio of survival functions is more stable and narrower than the band for the ratio of cumulative hazard functions. A goodness-of-fit test is developed for checking proportional hazards in k-sample problems. For the comparison of two distributions in the random censorship model (independent competing risks model without censoring), we construct empirical likelihood confidence bands for the ratio of the two cumulative hazards and the ratio of two survival functions. Goodness-of-fit tests for the Koziol--Green model and the equality of the corresponding hazard functions are also developed. We extend our approach to adjust for covariate effects. All the corresponding results are established under quite general conditions. The proposed methods are illustrated with a real data from a Mayo Clinic trial involving a treatment for primary biliary cirrhosis (PBC) of the liver. |
|
| DATE/PLACE: |
Monday, January 28, 2002, 16:15 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | The generalized AR(1) process and its applications to non-normal time series. |
| SPEAKER: |
Rong Zhu Dept. of Statistics, UBC |
| The theory of continuous-time generalized AR(1) processes is developed for modelling non-normal time series with equally or unequally-spaced time observations, which may be count data or positive-valued data. Such a process is a Markov process represented as the sum of a dependent term (involving extended thinning operation) and an innovation term. The stationary distribution of a continuous-time generalized AR(1) process can have support on non-negative integers or positive reals; common distributions such as Poisson, negative binomial and Gamma are included. In this talk, we will introduce the continuous-time generalized AR(1) process and its properties, as well as the characterization of its stationary distribution. The modelling procedure will be illustrated by two real cases with count data and positive-valued data respectively. | |
| DATE/PLACE: |
Friday, January 25, 2002, 9:00am Room 308, Angus Building 2053 Main Mall, UBC |
| TITLE: | A Method for Computing Runs- and Scans-Related Probabilities |
| SPEAKER: |
Dr. Galit Schmueli Department of Statistics, Carnegie Mellon University |
|
Runs and scans are two common patterns that are used for constructing stopping and switching rules in various fields (e.g., quality control, DNA sequencing, radar detection). Such rules are usually very intuitive and simple to apply, and are usually designed according to empirical rather than theoretical considerations. In contrast to the simplicity of understanding and using runs and scans rules, their probabilistic nature is quite complicated. In this talk we present a new method that leads to expressions for runs- and scans-related probability and generating functions. The talk will focus both on the mathematical derivations and on the application of the resulting formulas to various industrial statistical procedures. Our method generalizes and modernizes Feller's method (1968). It is based on constructing recurrence relations and solving linear equations, leading to an expression for the probability generating function of the runs- or scans-related variables. We then use the special form of the generating function to derive the probability function with the aid of efficient numerical methods. Finally we introduce the SQC Online website (http://www.stat.cmu.edu/~galit/SQCOnline). We created this interactive site in order to make our method useful for practitioners in the industry. According to the user's input, the site yields probabilities and plots for a variety of industrial applications (including sampling inspection, system reliability, control charts, and continuous sampling), allowing them to study and plan their process. |
|
| DATE/PLACE: |
Tuesday, January 22, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Multivariate Extremes, Max-Stable Process Estimation and Dynamic Financial Modeling |
| SPEAKER: |
Zhengjun Zhang Dept. of Statistics, University of North Carolina-Chapel Hill |
| Studies have shown that time series data from finance, insurance and environment etc. are fat tailed and clustered when extremal events occur. In an effort to characterize such extremal processes, max-stable processes or min-stable processes have been proposed since the 1980s and some probabilistic properties have been obtained. However, applications are very limited due to the lack of efficient statistical estimation methods. Recently, the author has shown some probabilistic properties of the processes and proposed a series of estimation procedures to estimate the underlying max-stable processes, i.e., multivariate maxima of moving maxima processes. In this talk, I will present some basic properties, estimating procedures of multivariate extremal processes, and illustrate how to model financial data as moving maxima processes. Examples will be illustrated with GE, Citibank, Pfizer stock index data. | |
| DATE/PLACE: |
Tuesday, January 15, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | Robust Multivariate Regression |
| SPEAKER: |
Dr. Stefan Van Aelst Postdoctoral Fellow of the FWO, Flanders (Belgium), University of Antwerp, Dept. of Mathematics and Computer Science, Universiteitsplein 1, B-2610 Wilrijk, Belgium. |
|
It is well known that classical multiple regression is extremely sensitive to outliers in a data set. The same problem holds in the case of multivariate regression. Therefore, we propose more robust methods for multivariate regression and investigate their properties such as influence function and breakdown point.
For a given data set of size n, the multivariate least trimmed squares estimator (MLTS) looks for the subset of size h (0 Using a C-step theorem similarly to Rousseeuw and Van Driessen (1999), we construct an efficient algorithm to compute the MLTS estimator. It is shown that the MLTS has a positive breakdown point that depends on the subset size h to be chosen by the user. In the case of elliptical error distributions we derive its influence function which is unbounded. Good leverage points can have a high effect on the estimator. We investigate the efficiency of the estimator and show that the choice of h is a trade-off between efficiency and breakdown. The use of reweighted versions of the estimator will also be investigated. Another approach is based on robust estimation of the location and scatter of the joint explanatory and response variables. We show that the robustness properties of the resulting multivariate regression estimator are inherited from the robust location and scatter estimator. We concentrate on the minimum covariance determinant estimator (MCD) to robustly estimate the location and scatter. We show that the influence function of the corresponding 0A regression estimator is bounded. We also investigate efficiency and propose reweighted versions of the estimator to increase efficiency. References: Rousseeuw, P.J. (1984). "Least median of squares regression," Journal of the American Statistical Association, 79, 871-880. Rousseeuw, P.J. and Van Driessen, K. (1999). "A fast algorithm for the minimum covariance determinant estimator," Technometrics, 41, 212-223. |
|
| DATE/PLACE: |
Tuesday, January 8, 2002, 16:00 Leonard S. Klinck 301 6356 Agricultural Road, UBC |
| TITLE: | To Infinity and Beyond: Model averaging and model spaces |
| SPEAKER: |
Dr. Bertrand Clarke Dept. of Statistics, UBC |
| Stacking is a way to do model averaging. It is similar to Bayes model averaging except that the weights are no longer posterior probabilities of models; they are obtained by other non-Bayes techniques. Here we develop a series of computed examples to contrast the performance of stacking and Bayes model averaging. In these cases stacking typically outperforms Bayes model averaging. Efforts to explain these cases lead us to revisit several of the basic assumptions that go into the statistical paradigm. | |
