Cargando…

Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, inc...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Changde, Easton, John, Rosencrance, Celeste, Li, Yan, Ju, Bensheng, Williams, Justin, Mulder, Heather L, Pang, Yakun, Chen, Wenan, Chen, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902034/
https://www.ncbi.nlm.nih.gov/pubmed/31566233
http://dx.doi.org/10.1093/nar/gkz826
_version_ 1783477610647388160
author Cheng, Changde
Easton, John
Rosencrance, Celeste
Li, Yan
Ju, Bensheng
Williams, Justin
Mulder, Heather L
Pang, Yakun
Chen, Wenan
Chen, Xiang
author_facet Cheng, Changde
Easton, John
Rosencrance, Celeste
Li, Yan
Ju, Bensheng
Williams, Justin
Mulder, Heather L
Pang, Yakun
Chen, Wenan
Chen, Xiang
author_sort Cheng, Changde
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis.
format Online
Article
Text
id pubmed-6902034
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69020342019-12-16 Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data Cheng, Changde Easton, John Rosencrance, Celeste Li, Yan Ju, Bensheng Williams, Justin Mulder, Heather L Pang, Yakun Chen, Wenan Chen, Xiang Nucleic Acids Res Methods Online Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis. Oxford University Press 2019-12-16 2019-09-30 /pmc/articles/PMC6902034/ /pubmed/31566233 http://dx.doi.org/10.1093/nar/gkz826 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Cheng, Changde
Easton, John
Rosencrance, Celeste
Li, Yan
Ju, Bensheng
Williams, Justin
Mulder, Heather L
Pang, Yakun
Chen, Wenan
Chen, Xiang
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title_full Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title_fullStr Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title_full_unstemmed Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title_short Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
title_sort latent cellular analysis robustly reveals subtle diversity in large-scale single-cell rna-seq data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902034/
https://www.ncbi.nlm.nih.gov/pubmed/31566233
http://dx.doi.org/10.1093/nar/gkz826
work_keys_str_mv AT chengchangde latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT eastonjohn latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT rosencranceceleste latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT liyan latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT jubensheng latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT williamsjustin latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT mulderheatherl latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT pangyakun latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT chenwenan latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata
AT chenxiang latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata