Spring 2017 UBC/SFU Joint Statistical Seminar
March 25th, 2017
Room 7000 in the Harbour Centre
Presented by PIMS and the Department of Statistics and Actuarial Science, SFU

PIMS logo                       SFU logo

Overview
The SFU-UBC Joint Graduate Student Workshop in Statistics is going into its 13th year. This is the second of two seminars to take place this school year, the one in Fall is organized by graduate students from SFU and the one in Spring is organized by graduate students from UBC. The idea of this event is to offer graduate students in Statistics and Actuarial Science with an opportunity to attend a seminar with accessible talks providing them an introduction to active areas of research in the field. For three students from each university the seminar allows them to present on their work, as well as to offer them an opportunity to develop their presentation skills with their peers.

Continuing with the usual format of past years this event will consist of talks given by four students (two from UBC and two from SFU) and one professor. The seminar also contains important social components, namely the morning coffee and the lunch where students get an opportunity to network with each other and foster a mutually beneficial relationship between the departments.


Sponsorship

This seminar could not take place without the generous help of our sponsors: The Pacific Institute for the Mathematical Sciences (PIMS) and the Department of Statistics and Actuarial Science, SFU (SFU).

Agenda For March 25th
8:30-9:00
Coffee and Pastries at Waves Coffee (492 West Hastings Street)
Across the street from the Harbour Centre

9:00-9:30
UBC Student Talk 1: Nelson Chen
Sequential Computer Experimental Design for Extreme Quantile Estimation Abstract

9:30-10:00
SFU Student Talk 1: Christina Nieuwoudt
SimRVPedigree: An R Package to Simulate Pedigrees Ascertained for Multiple Relatives Affected by a Rare Disease Abstract

10:00-11:00
UBC Faculty Talk: Harry Joe
Statistical computing, software comparisons, data science and numerical methods Abstract

11:00-11:30
SFU Student Talk 2: Jacob Mortensen
Urban Heat Risk Mapping Using Multiple Point Patterns in Houston, Texas Abstract

11:30-12:00
UBC Student Talk 2: Dustin Johnson
Online, Large-scale Modeling of Consumer Brand Perception using Latent Dirichlet Allocation Abstract

12:00-2:00
Lunch at Rogue Kitchen & Wetbar (601 W Cordova St)
Next to the Waterfront skytrain station

The presentation slides can be downloaded here


Directions and Accessibility
The seminar conveniently takes place in room 7000 on the SFU downtown campus in the Harbour Centre in downtown Vancouver (map). From SFU, the 135 bus will take you directly to the seminar location. From UBC, the 044 and 14 bus provide direct access. It is also near Waterfront station, which allows access from all Skytrain lines: Canada Line, Expo Line and Millenium Line.


X

Sequential Computer Experimental Design for Extreme Quantile Estimation

Computer models are often used to study the reliability of engineering processes. The research interest of this talk is to estimate an extreme quantile of the output of a computer model. We first build a statistical surrogate for the input-output relationship of the numerical model with a modest number of evaluations to obtain initial estimates. We then experiment sequentially, guided by a new criterion, to improve the estimate of the quantile. The newly proposed sequential method is compared with the popular expected improvement (EI) method in a real wood computer model that quantifies the relationship between Modulus of Elasticity (MOE) of joists and the corresponding deflection under a static load on a floor system. We show that the new sequential criterion leads to a faster convergence, which is the major contribution of the research. In addition, the uncertainty of the deflection distribution propagated from estimating input distributions is well quantified using different methods. It is shown that the effect of properly modelling the input distribution is much larger than we would expect.
X

SimRVPedigree: An R Package to Simulate Pedigrees Ascertained for Multiple Relatives Affected by a Rare Disease

Family-based studies are receiving renewed attention because of their ability to identify genetic susceptibility factors associated with rare diseases. These studies have more power to detect rare variants, require smaller sample sizes, and can more accurately detect sequencing errors than case-control studies. However, garnering enough families for analysis of a rare disease could require years of effort, making these studies difficult to replicate. To address this shortcoming we have created an R package, SimRVPedigree, to randomly simulate families ascertained to contain multiple relatives affected by a rare disease. The package aims to mimic the process of family development, while allowing users to incorporate multiple facets of family ascertainment. We illustrate how approximate Bayesian computation with SimRVPedigree may be used to estimate the relative risk of disease for genetic cases in a sample of ascertained families.
X

Statistical computing, software comparisons, data science and numerical methods

This presentation will discuss statistical computing topics that students might not get exposed to in courses. Statistical research and many jobs in statistics / data science require proficiency in statistical software and computing. Through examples, some topics of computing will be discussed, such as (a) comparisons of speed of programming languages and software, (b) writing code for reproducibility, (c) pseudo-code, (d) profiling code for bottlenecks, and (e) available resources for numerical optimization, differentiation and integration.
X

Urban Heat Risk Mapping Using Multiple Point Patterns in Houston, Texas

Extreme heat, or persistently high temperatures in the form of heat waves, adversely impacts human health. To study such effects, risk maps are a common epidemiological tool used to identify regions and populations that are more susceptible to these negative outcomes; however, the negative health effects of high temperatures are manifested differently among different segments of the population. In this paper, we propose a novel, hierarchical marked point process model that merges multiple health outcomes into an overall heat risk map. Specifically, we consider health outcomes of heat stress-related 911 calls and mortalities across the city of Houston, Texas. We show that combining multiple health outcomes leads to a broader understanding of the spatial distribution of heat risk than a single health outcome.
X

Online, Large-scale Modeling of Consumer Brand Perception using Latent Dirichlet Allocation

The presentation will consist of a brief overview of a consulting project recently undertaken, where the main purpose is to identify a level of consumer sentiment of a company's brand image through the use of social media data and product reviews. The subject of sentiment analysis will be discussed on a broad level, then attention will be focussed primarily on the methodology of latent Dirichlet allocation (LDA), including its ability to generate and model documents through approximating the underlying data distributions. Toy examples demonstrating the application of LDA for natural language processing (NLP) will be covered along with complementary R and Python code. The presentation will conclude with discussions on extensions of LDA for more complex, online settings as well as how to handle storage and optimization constraints of a large amount of unstructured social media data.