Cargando…

scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We s...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Qianqian, Su, Jing, Miller, Lance D., Zhang, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602751/
https://www.ncbi.nlm.nih.gov/pubmed/33359676
http://dx.doi.org/10.1016/j.gpb.2020.09.002
_version_ 1784601630263476224
author Song, Qianqian
Su, Jing
Miller, Lance D.
Zhang, Wei
author_facet Song, Qianqian
Su, Jing
Miller, Lance D.
Zhang, Wei
author_sort Song, Qianqian
collection PubMed
description In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.
format Online
Article
Text
id pubmed-8602751
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-86027512021-11-24 scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets Song, Qianqian Su, Jing Miller, Lance D. Zhang, Wei Genomics Proteomics Bioinformatics Method In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM. Elsevier 2021-04 2020-12-24 /pmc/articles/PMC8602751/ /pubmed/33359676 http://dx.doi.org/10.1016/j.gpb.2020.09.002 Text en © 2021 Beijing Institute of Genomics https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Song, Qianqian
Su, Jing
Miller, Lance D.
Zhang, Wei
scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_full scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_fullStr scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_full_unstemmed scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_short scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
title_sort sclm: automatic detection of consensus gene clusters across multiple single-cell datasets
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602751/
https://www.ncbi.nlm.nih.gov/pubmed/33359676
http://dx.doi.org/10.1016/j.gpb.2020.09.002
work_keys_str_mv AT songqianqian sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT sujing sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT millerlanced sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets
AT zhangwei sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets