Cargando…

SC3s: efficient scaling of single cell consensus clustering to millions of cells

BACKGROUND: Today it is possible to profile the transcriptome of individual cells, and a key step in the analysis of these datasets is unsupervised clustering. For very large datasets, efficient algorithms are required to ensure that analyses can be conducted with reasonable time and memory requirem...

Descripción completa

Detalles Bibliográficos
Autores principales: Quah, Fu Xiang, Hemberg, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9743492/
https://www.ncbi.nlm.nih.gov/pubmed/36503522
http://dx.doi.org/10.1186/s12859-022-05085-z
Descripción
Sumario:BACKGROUND: Today it is possible to profile the transcriptome of individual cells, and a key step in the analysis of these datasets is unsupervised clustering. For very large datasets, efficient algorithms are required to ensure that analyses can be conducted with reasonable time and memory requirements. RESULTS: Here, we present a highly efficient k-means based approach, and we demonstrate that it scales favorably with the number of cells with regards to time and memory. CONCLUSIONS: We have demonstrated that our streaming k-means clustering algorithm gives state-of-the-art performance while resource requirements scale favorably for up to 2 million cells. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05085-z.