Spring 2017 UBC/SFU Joint Statistical Seminar
March 25th, 2017
Room 7000 in the Harbour Centre
Presented by PIMS and the Department of Statistics and Actuarial Science, SFU
The SFU-UBC Joint Graduate Student
Workshop in Statistics is going into its 13th year. This is the second
seminars to take place this school year, the one in Fall is organized
from SFU and the one in Spring is organized by graduate students from
UBC. The idea of this event is to offer
graduate students in Statistics and Actuarial Science with an
opportunity to attend a seminar with accessible talks providing them an
introduction to active areas of research in the field. For three
students from each university the seminar allows them to present on
their work, as well as to offer them an opportunity
to develop their presentation skills with their peers.
Continuing with the usual format of
past years this event will consist
of talks given by four students (two from UBC and two from SFU) and
one professor. The seminar also contains
important social components, namely the morning coffee and the lunch
where students get an opportunity to network with each other and foster
a mutually beneficial relationship between the departments.
This seminar could not take place
without the generous help of our
sponsors: The Pacific Institute for the Mathematical Sciences (PIMS
and the Department of Statistics and Actuarial Science, SFU (SFU
|Agenda For March 25th
|Coffee and Pastries at Waves
West Hastings Street)
Across the street from the Harbour Centre
|UBC Student Talk 1: Nelson Chen
Sequential Computer Experimental Design for Extreme Quantile Estimation Abstract
|SFU Student Talk 1: Christina Nieuwoudt
SimRVPedigree: An R Package to Simulate Pedigrees Ascertained for Multiple Relatives Affected by a Rare Disease Abstract
|UBC Faculty Talk: Harry Joe
Statistical computing, software comparisons, data science and numerical methods Abstract
|SFU Student Talk 2: Jacob Mortensen
Urban Heat Risk Mapping Using Multiple Point Patterns in Houston, Texas Abstract
|UBC Student Talk 2: Dustin Johnson
Online, Large-scale Modeling of Consumer Brand Perception using Latent Dirichlet Allocation Abstract
|Lunch at Rogue Kitchen & Wetbar (601 W Cordova St)
Next to the Waterfront skytrain station
The presentation slides can be downloaded here
seminar conveniently takes place in room 7000 on the SFU downtown
campus in the Harbour Centre in downtown
From SFU, the 135 bus will take you directly to the
seminar location. From UBC, the 044 and 14 bus provide direct access.
It is also near Waterfront station, which allows access from all
Skytrain lines: Canada Line, Expo Line
and Millenium Line.
Sequential Computer Experimental Design for Extreme Quantile Estimation
Computer models are often used to study the reliability of engineering processes. The research interest of this talk is to estimate an extreme quantile of the output of a computer model. We first build a statistical surrogate for the input-output relationship of the numerical model with a modest number of evaluations to obtain initial estimates. We then experiment sequentially, guided by a new criterion, to improve the estimate of the quantile. The newly proposed sequential method is compared with the popular expected improvement (EI) method in a real wood computer model that quantifies the relationship between Modulus of Elasticity (MOE) of joists and the corresponding deflection under a static load on a floor system. We show that the new sequential criterion leads to a faster convergence, which is the major contribution of the research. In addition, the uncertainty of the deflection distribution propagated from estimating input distributions is well quantified using different methods. It is shown that the effect of properly modelling the input distribution is much larger than we would expect.
SimRVPedigree: An R Package to Simulate Pedigrees Ascertained for Multiple Relatives Affected by a Rare Disease
Family-based studies are receiving renewed attention because of their ability to identify genetic susceptibility factors associated with rare diseases. These studies have more power to detect rare variants, require smaller sample sizes, and can more accurately detect sequencing errors than case-control studies. However, garnering enough families for analysis of a rare disease could require years of effort, making these studies difficult to replicate. To address this shortcoming we have created an R package, SimRVPedigree, to randomly simulate families ascertained to contain multiple relatives affected by a rare disease. The package aims to mimic the process of family development, while allowing users to incorporate multiple facets of family ascertainment. We illustrate how approximate Bayesian computation with SimRVPedigree may be used to estimate the relative risk of disease for genetic cases in a sample of ascertained families.
Statistical computing, software comparisons, data science and numerical methods
This presentation will discuss statistical computing topics that
students might not get exposed to in courses.
Statistical research and many jobs in statistics / data science
require proficiency in statistical software and computing.
Through examples, some topics of computing will be discussed, such as
(a) comparisons of speed of programming languages and software, (b)
writing code for reproducibility, (c) pseudo-code, (d) profiling code
for bottlenecks, and (e) available resources for numerical optimization,
differentiation and integration.
Urban Heat Risk Mapping Using Multiple Point Patterns in Houston, Texas
Extreme heat, or persistently high temperatures in the form of heat waves, adversely impacts human health. To study such effects, risk maps are a common epidemiological tool used to identify regions and populations that are more susceptible to these negative outcomes; however, the negative health effects of high temperatures are manifested differently among different segments of the population. In this paper, we propose a novel, hierarchical marked point process model that merges multiple health outcomes into an overall heat risk map. Specifically, we consider health outcomes of heat stress-related 911 calls and mortalities across the city of Houston, Texas. We show that combining multiple health outcomes leads to a broader understanding of the spatial distribution of heat risk than a single health outcome.
Online, Large-scale Modeling of Consumer Brand Perception using Latent Dirichlet Allocation
The presentation will consist of a brief overview of a consulting project recently undertaken, where the main purpose is to identify a level of consumer sentiment of a company's brand image through the use of social media data and product reviews. The subject of sentiment analysis will be discussed on a broad level, then attention will be focussed primarily on the methodology of latent Dirichlet allocation (LDA), including its ability to generate and model documents through approximating the underlying data distributions. Toy examples demonstrating the application of LDA for natural language processing (NLP) will be covered along with complementary R and Python code. The presentation will conclude with discussions on extensions of LDA for more complex, online settings as well as how to handle storage and optimization constraints of a large amount of unstructured social media data.