*To join this seminar via Zoom, attendees will need to request connection details from headsec [at] stat.ubc.ca.
Abstract: Spatio-temporal statistical methods are widely used to model natural phenomena across both space and time. Example phenomena include the concentrations of airborne pollutants and the distributions of endangered species. A spatio-temporal process is said to have been preferentially sampled when the locations and/or times chosen to observe it depend stochastically on the values of the process at the chosen locations and/or times. When standard statistical methodologies are used, predictions of a preferentially sampled spatio-temporal process into unsampled regions and times may be severely biased.
In this talk, we begin by providing a visual demonstration of preferential sampling in continuous-space, discrete-space, and point-pattern data. Next, we argue that preferential sampling is highly prevalent in real-world data, and in some cases, national laws may even dictate that data be preferentially sampled. Following this, we introduce two case studies: estimating historical UK black smoke pollution levels using data collected from a preferentially sampled air quality network and estimating the space-use of an endangered ecotype of killer whales using sightings data collected from the commercial whale-watching industry. For the first dataset, to confirm the presence of preferential sampling, we develop a fast, intuitive, powerful, and general test for preferential sampling. Then, to estimate bias-corrected air pollution levels across the UK, we develop the first general framework for modelling preferentially sampled spatio-temporal data. We demonstrate that existing estimates of population-level black smoke exposures may be highly inaccurate due to preferential sampling. Finally, for modelling the killer whale space-use, we develop a point process framework for modelling preferentially sampled spatio-temporal point-pattern data. We successfully develop maps that identify core areas of high activity that will hopefully prove useful for conservation purposes.
Statisticians from almost every domain routinely scrutinise the data collection protocols before analysing data. Yet within the domain of spatio-temporal modelling, few questions are typically asked about how and why the sampled locations and times were chosen. This needs to change. Ultimately, we hope that investigations into preferential sampling will become an essential component within spatio-temporal analyses, akin to model diagnostics. The methods presented in this talk are widely applicable, allowing researchers to routinely perform such investigations.