News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Enter the characters shown in the image.

User menu

You are here

Bayesian sparse regression for large-scale observational healthcare analytics

Tuesday, December 1, 2020 - 11:00 to 12:00
Akihiko (Aki) Nishimura, Assistant Professor of Biostatistics, Johns Hopkins University
Statistics Seminar

*To join this seminar via Zoom, attendees will need to request connection details from headsec [at]

Post-seminar Q&A: Graduate students are invited to stay after the seminar for a Q&A with the speaker (~12pm12:30pm).

Abstract: Growing availability of large healthcare databases presents opportunities to investigate how patients' response to treatments vary across subgroups. Even with a large cohort size found in these databases, however, low incidence rates make it difficult to identify causes of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where the signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data sets, however, have been hampered by the lack of scalable computational techniques. We adapt ideas from numerical linear algebra and computational physics to tackle the critical bottleneck in computing posteriors under Bayesian sparse regression. For linear and logistic models, we develop the conjugate gradient sampler for high-dimensional Gaussians along with the theory of prior-preconditioning. For more general regression and survival models, we develop the curvature-adaptive Hamiltonian Monte Carlo to efficiently sample from high-dimensional log-concave distributions. We demonstrate the scalability of our method on an observational study involving n = 1,065,745 patients and p = 15,779 clinical covariates, designed to compare effectiveness of the most common first-line hypertension treatments. The large cohort size allows us to detect an evidence of treatment effect heterogeneity previously unreported by clinical trials.