A highly scalable approach to topic modelling in single-cell data by approximate pseudobulk projection

Subscribe to email list

Please select the email list(s) to which you wish to subscribe.

User menu

You are here

A highly scalable approach to topic modelling in single-cell data by approximate pseudobulk projection

TitleA highly scalable approach to topic modelling in single-cell data by approximate pseudobulk projection
Publication TypeUnpublished
Year of Publication2024
AuthorsSubedi, S, Sumida, TS, Park, YP
Series TitlebioRxiv
Pagination2024.02.21.581497
AbstractProbabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states, and topic-specific gene frequency vectors provide interpretable bases to be compared with known cell-type-specific marker genes. However, fitting a topic model on a large number of cells would require heavy computational resources–specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating Single-cell data by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps. \#\#\# Competing Interest Statement The authors have declared no competing interest.