J. Janssen and A.A. Ameli, "A Hydrologic Functional Approach for Improving Large-Sample Hydrology Performance in Poorly Gauged Regions", Water Resources Research, 2021, 57, e2021WR030263, https://doi.org/10.1029/2021WR030263

Task: The paper provides background on some hydrologic data which you will analyze. The data comprise about 40 years of annual waterflow data (there are several different definitions of the response) along with annual predictor variables. Generally speaking, your task is to predict one or more of the responses from the predictors. There is much scope here to try different modelling strategies: e.g., compare regression models (which the authors use) versus say Gaussian processes (which could handle correlation from the time variable and could be implemented via the GaSP package in R). There are also opportunities to explore variable selection, the role of time (is there correlation?), or other interesting questions that appeal to you. You don't have to exercise a vast array of statistical methods. Stay focussed!

Comments:

- The data provided for your project do not have exactly the variables appearing in the paper.
- The several responses can be modelled separately.
- Your models may or may not predict well. Don't worry: it is also useful to know what doesn't work here.
- Your report should give a very brief summary of the hydrological objectives, but don't dwell on the scientific details. Rather, your report can concentrate on how and why your analysis is different.
- The project provides insight into a broader ongoing CANSSI research project involving statisticians and hydrologists.
- The task might seem easy: just an undergraduate regression exercise. There is opportunity for higher-level statistical thinking, however, and that is the expectation.

D. R. Jones, M. Schonlau and W. J. Welch, "Efficient Global Optimization of Expensive Black-Box Functions", Journal of Global Optimization, 1998, 13, pp. 455-492, doi = 10.1023/A:1008306431147.

Task: The paper describes using a Gaussian process (GP) to guide the search for the global optimum of a function. As updating a GP conditional on data is within the Bayesian paradigm, the paper spawned thousands of others on the general topic of "Bayesian optimization".

- Give a brief review of the how the method might achieve the objective of global optimization in an efficient way, along with a critical assessment of any drawbacks.
- While the method is called Bayesian, the implementation is often not fully Bayesian: plug-in estimates of hyperparameters may be used in practice, versus say an MCMC approximation to the posterior of the hyperparameters. Investigate whether a more fully Bayesian approach makes any difference.

Return to the faculty list