Saturday, March 16th 2019
Room 2270 in the Harbour Center
555 West Hastings Street, Vancouver, BC V6B 4N4
The Winter 2018 SFU/UBC Joint Seminar is the second of two seminars taking place in the 2017/2018 school year between the UBC department of Statistics and the SFU department of Statistics and Actuarial Science. It is intended to give graduate students in these departments a chance to raise awareness about active areas of research in Statistics, network with other students and faculty members, and the chance to share their research as a speaker. The Fall seminar is organized by graduate students from SFU, and the Winter seminar is organized by graduate students from UBC.
The seminar consists of talks given by three graduate students from each department and one faculty member from UBC. Morning coffee and lunch is provided to seminar participants. Information about previous seminars can be found here.
Time | Event |
---|---|
8:30-9:00 | Coffee and Pastries @ Waves Coffee |
9:00-9:20 | Title: Document clustering for small datasets using paragraph embeddings and Latent Dirichlet Allocation. Speaker: Maude Lachaine, SFU Abstract: As the hype around AI intensifies, non-technical clients can set unrealistically high expectations of modern methods, often with less-than-ideal datasets. This is a case study highlighting our efforts at unsupervised document classification on a small number of mission statements sent by community art centres. The primary goal, to classify the art establishments based on the clientele they serve, was achieved by leveraging information from the limited dataset using customized preprocessing techniques, word embeddings with n-grams and paragraph embeddings (Doc2Vec). This allowed for a measure of similarity between certain keywords and the documents, used as a classification score. The accuracy was evaluated against a randomly selected subset of hand-labeled documents. Clustering and visualization techniques were also used to further understand underlying patterns in the data. This exercise is the result of consulting work for SFU's Big Data Hub. |
9:20-9:40 | Title: Secure Multi-party Computing: A modern solution for distrust among data providers. Speaker: Lucy Mosquera, UBC Abstract: As statisticians, we are regularly privileged with access to sensitive data. What happens when multiple data providers can't agree to trust a statistician? Secure multi-party computation (SMC) provides the means for statisticians to perform analysis on encrypted data from multiple sources, without decrypting the data. This talk will provide an introduction into encrypting data using Shamir's secret sharing, examples of SMC implementation, results of a comparison study, and information on the user experience. This talk aims to introduce statisticians to this developing field as a technology they may encounter through their careers. |
9:40-10:00 | Title: The Routes to Success. Speaker: James Thomson, SFU Abstract: The NFL, in conjunction with Next Gen Stats, provided six weeks of player tracking data from the 2017 NFL season, where the data is able to capture real-time location of players on the field. Using functional clustering we attempted identify the individual route runs by each player on every passing play from the first 6 weeks of the 2017 season, and find combinations of routes that lead to successful plays. The idea of space created by an offence was explored as well. By using the speed and direction of each player on the field, an approximate “zone of control” was estimated to visualize how open the targeted receiver was in the time leading up to a successful or failed throw. They hope to use this as a new method of measuring a quarterbacks decision making ability. |
10:00-10:10 | Break |
10:10-10:30 | Title: On Split Estimation. Speaker: Anthony Christidis, UBC Abstract: From a theoretical standpoint, it can be shown that splitting variables can reduce the variance of linear functions of the regression coefficient estimate. Splitting combined with shrinkage can result in estimators with smaller mean squared error compared to popular shrinkage estimators such as Lasso, ridge regression, elastic net, or other penalized regression estimates. In this talk, a number of approaches to searching for the optimal split of variables will be discussed. Particularly, emphasis will be placed on a new split estimation method, SplitReg: the optimal split of the variables into groups and the regularized estimation of the regression coefficients are performed by minimizing an objective function that encourages sparsity within each group and diversity among them. Our procedure works on top of a given penalized linear regression estimator (e.g., Lasso, elastic net) by fitting it to possibly overlapping groups of features, encouraging diversity among these groups to reduce the correlation of the corresponding predictions. For the case of an elastic net penalty and orthogonal predictors, we give a closed form solution for the regression coefficients in each group. We establish the consistency of our method with the number of predictors possibly increasing with the sample size. An extensive simulation study and real-data applications show that in general the proposed method improves the prediction accuracy of the base estimator used in the procedure. Possible extensions to GLMs and other models are discussed. Robustification of the method will also be discussed in the context of SplitReg. |
10:30-10:50 | Title: Super fast emulation and calibration of large computer experiments with high-dimensional output. Speaker: Grace Hsu, SFU Abstract: Scientific investigations are often expensive and the ability to quickly perform analysis of data on-location at experimental facilities can save valuable resources. Further, computer models that leverage scientific knowledge can be used to gain insight in complex processes and reduce the need for costly physical experiments, but in turn may be computationally expensive to run. We compare multiple statistical surrogates or emulators based on Gaussian processes for expensive computer models, with the goal of producing predictions quickly given large training sets. We then present a modularised approach for finding the values of inputs that allow for the surrogate model to match reality, or field observations. This process is model calibration. We then extend the emulator of choice and calibration procedure for use with high-dimensional response and demonstrate their speed and efficacy on datasets from a series of transmission impact experiments. |
10:50-11:20 | Title: The squared error has friends, too! Guest Speaker: Dr. Vincenzo Coia, UBC Abstract: Why do we consider the squared error in regression? It's not an arbitrary choice! Come see what you're actually getting when you minimize the sum of squared errors, and see what you *could* get by minimizing other things. |
11:20-11:30 | Break |
11:30-12:30 | Title: How to Explain Things: Guidelines for Effective Scientific Communication. Speaker: Professor Trevor Campbell, UBC Abstract: After being neck-deep in infinite-dimensional minimax heavy-tailed basket-weaving theory for months now, you've finally cracked your theorem! It's time to write a paper about it. But how to start? The work is so specific and technical, it would take a full manuscript to simply introduce the background material. How can you convey your work in a way that convinces reviewers that it's an important contribution? |
12:30-13:30 | Lunch @ Steamworks Gastown |