News & Events

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

User menu

You are here

What can statisticians learn from the analysis of C.elegans data?

Tuesday, March 15, 2022 - 11:00 to 12:00
Gonzalo E. Mena, Florence Nightingale Bicentennial Fellow and Tutor in Computational Statistics and Machine Learning, Department of Statistics, University of Oxford
Statistics Seminar
Zoom

To join via Zoom: To join this seminar, please request Zoom connection details from headsec [at] stat.ubc.ca

Title: What can statisticians learn from the analysis of C.elegans data?

Abstract: In modern scientific setups we are faced with unprecedented challenges regarding how to process data efficiently and in a robust way. These challenges often reveal the brittleness of our current tools, dictating the need for new methods. In this talk I will describe new statistical and AI methods motivated by a pressing problem in neuroscience, the need for imaging entire brains at single-neuron resolution.

Specifically, I will present my contribution to NeuroPAL, a new breakthrough technology that enables a colorful imaging of every single neuron in the brain of the C.elegans worm. I will describe new methods for two difficult tasks arising in these datasets: neural segmentation and identification. These two tasks are related to an underlying deconvolution model, a mixture of gaussians, and classical methods such as the EM algorithm fall short. Behind these new methods there is a key statistical physics principle, the so-called Schrödinger bridge, a ‘thought experiment’ that realizes the solution of an entropy-regularized optimal transport problem. This thought experiment was proposed in 1932 but it has yet to percolate into the mainstream of statistics.

I will first describe some fundamental statistical properties of the Schrödinger bridge that I established. For example, when estimating it from samples, it enjoys the 1/sqrt n convergence rate, avoiding the curse of dimensionality. Second, I will introduce a new loss function based on this principle and show that it is a better optimization objective than the log-likelihood for model-based clustering, reducing pathologies such as bad local optima and inconsistency. In consequence, a new algorithm derived from this loss, Sinkhorn EM, attains better, more robust neural segmentation performance. After, I will comment on how these principles can be used to probabilistically identify neurons in C.elegans, leading to meaningful uncertainty quantification on this hard combinatorial setup. Finally, I will comment on how these novel methods have proven to be useful in other contexts such as deep learning.