Papers suggested by Jim Zidek


In the 2020/21 academic year, I will be available to handle one 548 paper in each of October, November, Feb and March.  The assessment criteria for Stat 548 papers are posted on the web, but in general I am hoping to see well written reports that explain results in the papers, how they were found, additional examples,  gaps and shortcomings in the work, as well as interesting proposals for future work.


The papers:

·  Diggle, P.J., Menezes, R. and Su, T-L. (2010). Geostatistical inference under preferential sampling. Appl. Statist. 59, 191-232. link This paper, a landmark in spatial statistics, describes/assesses the potential effect of selecting sites in biased way for a network designed to monitor an environmental/spatial process, An example is a set of urban ozone monitors set up specifically to find detect noncompliance with regulatory standard so as to protect human health. Ironically the data from such a monitoring network will unestimate the effects of ozone. The paper, which is now something of a classic, sparked interest in a topic that has turned into a very active research area in environmetrics. PAPER HAS BEEN TAKEN FOR 2020/2021

·  Shen, W., Davis, T., Lin, D.K.J. and Nachtsheim, C.J.(2014). Dimensional analysis and its applications in statistics. J of Quality Technology, 46, 185-198. link This paper concerns an important topic in the application of statistics,  namely,  the dimensions, scales and units of measurement on which the data were collected. No doubt this is in part because data in datasets don't come with units attached. But perhaps its also due to the training statisticians receive that is heavily based on mathematical and computational formalities. As a result a lot of the tables and graphs one sees in publications are meaningless since the units are not attached. But a more fundamental issue, the one addressed in this paper has to do with the fact that not all models are valid due to their failure to recognize these dimensions. Moreover in many cases, recognizing those limitations on models imposed by the dimensions actually simplifies the process of modelling the data. This paper is a well-recognized contribution to the literature on that topic. It would be great if you could come up with an additional example!

·  Wakefield, J and Shaddick, G.S (2006). Health-exposure modeling and the ecological fallacy. Biostatistics, 7, 438-455. link This paper concerns an important issue in the application of statistics,  notably in health related research,  for example in epidemiology. An issue arises when data are aggregated as they must be sometimes when official statistics such as when the number of deaths are reported. In that example administrative records may give those numbers for districts while the levels of a hazardous substance such as air pollution levels are given for a few specific locations in each such district. The ecological fallacy may then arise: the association between the two sorts of data may be negative at the aggregate level and positive when smaller subregions are analysed using the same data. Hence this important phenomenon has been much studied. This paper is a well-recognized contribution to the literature of that subject.

 

·      Evans, J.W., Johnson, R.A. and Green, D.W. (1984). Estimating the correlation between variables under destructive testing, or how to break the same board twice.  Technometrics., 26, 285-290.  link

This paper illustrates the magic of statistics. It concerns a problem that arises in structural engineering where the strength of structural members such as a piece of lumber play an important role in determining the strength of a structure, e.g. the LSK building at UBC! One such measure is the failure load that breaks the member, when stretched. Another is its bending strength.  However, you cannot break the same member twice to determine the relationship between these two measures of strength, thereby potentially eliminating the need to measure both as one can be predicted from the other (as in the case of carbon fibre panels for AirBus aeroplane wings).  The problem  is that you cannot break the same specimen twice. Or can you?  This paper shows the answer can be a yes. 

 

·      McClintock, B.T., Johnson, D.S. Hooten, M.B., Ver Hoef, J.M. and Morales, J.M.  (2014). When to be discrete: the importance of time formulation in understanding animal movement.  Movement Ecology, 1-14.  link

The explosion in new technology we are seeing today, has led to tags for tracking animal movement, so small that even small birds can be tracked with the goal of determining where they go and ultimately perhaps, why they go there. However the resulting data records can be of huge dimension, well beyond the scope of conventional software for analysis. For example, a single female seal foraging for food to feed her pups, can go out to sea for 10 to 20 days before returning to land to feed her pubs,  thereby creating a data record,  700,000 items long. This paper by a distinguished team of statistical ecologists and statisticians, discusses the problem of modelling animal tracks and when you can assume the time domain is discrete as against continuous.  The subject involves stochastic processes, e.g. Browian motion.

 

·  Michela Cameletti, Finn Lindgren, Daniel Simpson, Havard Rue (2012). Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Stat Anal. To appear. link

This paper describes a valuable new approach to modelling random spatial fields over large domains, for example, temperature over the earth's surface, where the dimension of the multivariate response vector, each coordinate representing a geographical site, is so large that traditional methods for Bayesian analysis such as MCMC cannot possibly be used. The method, INLA is developed using a link between a stochastic partial differential equation and the famous Matern covariance model. The application concerns a nasty air pollution field formed from small particles that are strongly linked to adverse health effects, like mortality due to cardiovascular problems. Computing and math background will be desirable, but the paper seems pretty readable and self-contained.


===================================================