Sfaira accelerates data and model reuse in single cell genomics.
Exploratory analysis of single-cell RNA-seq data sets is currently based on statistical and machine learning modelsthat are adapted to each new data set from scratch. A typical analysis workflow includes a choice of dimensionality reduction, selection of clustering parameters, and mapping of prior annotation. These steps typically require severaliterations and can take up significant time in many single-cell RNA-seq projects. Here, we introduce sfaira, which is asingle-cell data and model zoo which houses data sets as well as pre-trained models. The data zoo is designed tofacilitate the fast and easy contribution of data sets, interfacing to a large community of data providers. Sfairacurrently includes 233 data sets across 45 organs and 3.1 million cells in both human and mouse. Using these datasets we have trained eight different example model classes, such as autoencoders and logistic cell type predictors: The infrastructure of sfaira is model agnostic and allows training und usage of many previously published models. Sfaira directly aids in exploratory data analysis by replacing embedding and cell type annotation workflows with end-to-end pre-trained parametric models. As further example use cases for sfaira, we demonstrate the extraction ofgene-centric data statistics across many tissues, improved usage of cell type labels at different levels of coarseness,and an application for learning interpretable models through data regularization on extremely diverse data sets.
Helmholtz AI - HMGU (HAI - HMGU)
Chan Zuckerberg Initiative