Cargando…
scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602751/ https://www.ncbi.nlm.nih.gov/pubmed/33359676 http://dx.doi.org/10.1016/j.gpb.2020.09.002 |
_version_ | 1784601630263476224 |
---|---|
author | Song, Qianqian Su, Jing Miller, Lance D. Zhang, Wei |
author_facet | Song, Qianqian Su, Jing Miller, Lance D. Zhang, Wei |
author_sort | Song, Qianqian |
collection | PubMed |
description | In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM. |
format | Online Article Text |
id | pubmed-8602751 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-86027512021-11-24 scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets Song, Qianqian Su, Jing Miller, Lance D. Zhang, Wei Genomics Proteomics Bioinformatics Method In gene expression profiling studies, including single-cell RNAsequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM. Elsevier 2021-04 2020-12-24 /pmc/articles/PMC8602751/ /pubmed/33359676 http://dx.doi.org/10.1016/j.gpb.2020.09.002 Text en © 2021 Beijing Institute of Genomics https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Song, Qianqian Su, Jing Miller, Lance D. Zhang, Wei scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title | scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title_full | scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title_fullStr | scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title_full_unstemmed | scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title_short | scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets |
title_sort | sclm: automatic detection of consensus gene clusters across multiple single-cell datasets |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602751/ https://www.ncbi.nlm.nih.gov/pubmed/33359676 http://dx.doi.org/10.1016/j.gpb.2020.09.002 |
work_keys_str_mv | AT songqianqian sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets AT sujing sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets AT millerlanced sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets AT zhangwei sclmautomaticdetectionofconsensusgeneclustersacrossmultiplesinglecelldatasets |