Cargando…

Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

MOTIVATION: The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require...

Descripción completa

Detalles Bibliográficos
Autores principales:	Venkatasubramanian, Meenakshi, Chetal, Kashish, Schnell, Daniel J, Atluri, Gowtham, Salomonis, Nathan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320606/ https://www.ncbi.nlm.nih.gov/pubmed/32207533 http://dx.doi.org/10.1093/bioinformatics/btaa201

_version_	1783551277602439168
author	Venkatasubramanian, Meenakshi Chetal, Kashish Schnell, Daniel J Atluri, Gowtham Salomonis, Nathan
author_facet	Venkatasubramanian, Meenakshi Chetal, Kashish Schnell, Daniel J Atluri, Gowtham Salomonis, Nathan
author_sort	Venkatasubramanian, Meenakshi
collection	PubMed
description	MOTIVATION: The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. RESULTS: We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. AVAILABILITY AND IMPLEMENTATION: ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7320606
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73206062020-07-01 Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF Venkatasubramanian, Meenakshi Chetal, Kashish Schnell, Daniel J Atluri, Gowtham Salomonis, Nathan Bioinformatics Original Papers MOTIVATION: The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. RESULTS: We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. AVAILABILITY AND IMPLEMENTATION: ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-06-15 2020-03-24 /pmc/articles/PMC7320606/ /pubmed/32207533 http://dx.doi.org/10.1093/bioinformatics/btaa201 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Venkatasubramanian, Meenakshi Chetal, Kashish Schnell, Daniel J Atluri, Gowtham Salomonis, Nathan Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title	Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title_full	Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title_fullStr	Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title_full_unstemmed	Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title_short	Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF
title_sort	resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and nmf
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320606/ https://www.ncbi.nlm.nih.gov/pubmed/32207533 http://dx.doi.org/10.1093/bioinformatics/btaa201
work_keys_str_mv	AT venkatasubramanianmeenakshi resolvingsinglecellheterogeneityfromhundredsofthousandsofcellsthroughsequentialhybridclusteringandnmf AT chetalkashish resolvingsinglecellheterogeneityfromhundredsofthousandsofcellsthroughsequentialhybridclusteringandnmf AT schnelldanielj resolvingsinglecellheterogeneityfromhundredsofthousandsofcellsthroughsequentialhybridclusteringandnmf AT atlurigowtham resolvingsinglecellheterogeneityfromhundredsofthousandsofcellsthroughsequentialhybridclusteringandnmf AT salomonisnathan resolvingsinglecellheterogeneityfromhundredsofthousandsofcellsthroughsequentialhybridclusteringandnmf

Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

Ejemplares similares