# van Eeden seminar: Calcium imaging, clustering, and corncob

Tuesday, March 3, 2020 - 11:00 to 12:00
van Eeden Invited Speaker Daniela Witten, Professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics
Statistics Seminar
Lecture Room 4, Woodward IRC Building, 2194 Health Sciences Mall

**Note: This talk is in an unusual location (the Woodward IRC Building at 2194 Health Sciences Mall). Consider giving yourself some extra time to walk to the building and find the seminar room (Lecture Room 4on the main floor).**

There will be a pre-talk reception in the Woodward IRC lobby at 10:30am.

**********

Abstract: Since this is a student-invited seminar, I'm going to highlight three research projects led by my three senior PhD students. Each project is motivated by a distinct problem in biology.

First, calcium imaging data is transforming the field of neuroscience by making it possible to assay the activities of large numbers of neurons simultaneously. For each neuron, the resulting "fluorescence trace" can be seen as a noisy surrogate of its spikes over time. In order to deconvolve a fluorescence trace into the underlying spike times, we consider an auto-regressive model for calcium dynamics. This leads naturally to a seemingly intractable $\ell_0$ optimization problem. I will show that it is in fact possible to efficiently solve this optimization problem for the global optimum, leading to substantial improvements over competing approaches. I will also talk about quantifying uncertainty associated with these spike estimates.

Second, across many areas of biology, it is becoming increasingly common to collect "multi-view data": that is, data in which multiple data types (e.g. gene expression, DNA sequence, clinical measurements) have been measured on a single set of observations (e.g. patients). I will consider the following question: given a set of n observations with measurements on L data types, can a single clustering of the n observations be defined on all L data types, or does each data type have its own clustering of the observations? To answer this question, I will introduce a general framework for modeling multi-view data, as well as hypothesis tests that can be used in order to characterize the extent to which the clusterings on each of the L data types are the same or different.

Finally, I will consider a fundamental question that arises in the analysis of microbial ecology data: how can we determine whether the abundance of a given taxon differs across conditions?

Sean Jewell, Lucy Gao, and Bryan Martin are 5th year PhD students at University of Washington who carried out the work described in this talk.

**********

This talk is supported by the van Eeden fund, the Department of Statistics, and PIMS.