Cargando…
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, inc...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902034/ https://www.ncbi.nlm.nih.gov/pubmed/31566233 http://dx.doi.org/10.1093/nar/gkz826 |
_version_ | 1783477610647388160 |
---|---|
author | Cheng, Changde Easton, John Rosencrance, Celeste Li, Yan Ju, Bensheng Williams, Justin Mulder, Heather L Pang, Yakun Chen, Wenan Chen, Xiang |
author_facet | Cheng, Changde Easton, John Rosencrance, Celeste Li, Yan Ju, Bensheng Williams, Justin Mulder, Heather L Pang, Yakun Chen, Wenan Chen, Xiang |
author_sort | Cheng, Changde |
collection | PubMed |
description | Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis. |
format | Online Article Text |
id | pubmed-6902034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69020342019-12-16 Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data Cheng, Changde Easton, John Rosencrance, Celeste Li, Yan Ju, Bensheng Williams, Justin Mulder, Heather L Pang, Yakun Chen, Wenan Chen, Xiang Nucleic Acids Res Methods Online Single-cell RNA sequencing (scRNA-seq) is a powerful tool for characterizing the cell-to-cell variation and cellular dynamics in populations which appear homogeneous otherwise in basic and translational biological research. However, significant challenges arise in the analysis of scRNA-seq data, including the low signal-to-noise ratio with high data sparsity, potential batch effects, scalability problems when hundreds of thousands of cells are to be analyzed among others. The inherent complexities of scRNA-seq data and dynamic nature of cellular processes lead to suboptimal performance of many currently available algorithms, even for basic tasks such as identifying biologically meaningful heterogeneous subpopulations. In this study, we developed the Latent Cellular Analysis (LCA), a machine learning–based analytical pipeline that combines cosine-similarity measurement by latent cellular states with a graph-based clustering algorithm. LCA provides heuristic solutions for population number inference, dimension reduction, feature selection, and control of technical variations without explicit gene filtering. We show that LCA is robust, accurate, and powerful by comparison with multiple state-of-the-art computational methods when applied to large-scale real and simulated scRNA-seq data. Importantly, the ability of LCA to learn from representative subsets of the data provides scalability, thereby addressing a significant challenge posed by growing sample sizes in scRNA-seq data analysis. Oxford University Press 2019-12-16 2019-09-30 /pmc/articles/PMC6902034/ /pubmed/31566233 http://dx.doi.org/10.1093/nar/gkz826 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Cheng, Changde Easton, John Rosencrance, Celeste Li, Yan Ju, Bensheng Williams, Justin Mulder, Heather L Pang, Yakun Chen, Wenan Chen, Xiang Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title | Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title_full | Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title_fullStr | Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title_full_unstemmed | Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title_short | Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data |
title_sort | latent cellular analysis robustly reveals subtle diversity in large-scale single-cell rna-seq data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6902034/ https://www.ncbi.nlm.nih.gov/pubmed/31566233 http://dx.doi.org/10.1093/nar/gkz826 |
work_keys_str_mv | AT chengchangde latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT eastonjohn latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT rosencranceceleste latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT liyan latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT jubensheng latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT williamsjustin latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT mulderheatherl latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT pangyakun latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT chenwenan latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata AT chenxiang latentcellularanalysisrobustlyrevealssubtlediversityinlargescalesinglecellrnaseqdata |