Cargando…

scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections

With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is o...

Descripción completa

Detalles Bibliográficos
Autores principales: Bian, Chuang, Wang, Xubin, Su, Yanchi, Wang, Yunhe, Wong, Ka-chun, Li, Xiangtao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9108753/
https://www.ncbi.nlm.nih.gov/pubmed/35615016
http://dx.doi.org/10.1016/j.csbj.2022.04.023
_version_ 1784708772710580224
author Bian, Chuang
Wang, Xubin
Su, Yanchi
Wang, Yunhe
Wong, Ka-chun
Li, Xiangtao
author_facet Bian, Chuang
Wang, Xubin
Su, Yanchi
Wang, Yunhe
Wong, Ka-chun
Li, Xiangtao
author_sort Bian, Chuang
collection PubMed
description With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC.
format Online
Article
Text
id pubmed-9108753
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-91087532022-05-24 scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections Bian, Chuang Wang, Xubin Su, Yanchi Wang, Yunhe Wong, Ka-chun Li, Xiangtao Comput Struct Biotechnol J Research Article With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC. Research Network of Computational and Structural Biotechnology 2022-04-27 /pmc/articles/PMC9108753/ /pubmed/35615016 http://dx.doi.org/10.1016/j.csbj.2022.04.023 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Bian, Chuang
Wang, Xubin
Su, Yanchi
Wang, Yunhe
Wong, Ka-chun
Li, Xiangtao
scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title_full scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title_fullStr scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title_full_unstemmed scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title_short scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections
title_sort scefsc: accurate single-cell rna-seq data analysis via ensemble consensus clustering based on multiple feature selections
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9108753/
https://www.ncbi.nlm.nih.gov/pubmed/35615016
http://dx.doi.org/10.1016/j.csbj.2022.04.023
work_keys_str_mv AT bianchuang scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections
AT wangxubin scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections
AT suyanchi scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections
AT wangyunhe scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections
AT wongkachun scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections
AT lixiangtao scefscaccuratesinglecellrnaseqdataanalysisviaensembleconsensusclusteringbasedonmultiplefeatureselections