Cargando…

A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study

Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene fu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhuang, Jujuan, Ren, Changjing, Ren, Dan, Li, Yu’ang, Liu, Danyang, Cui, Lingyu, Tian, Geng, Yang, Jiasheng, Liu, Jingbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670174/
https://www.ncbi.nlm.nih.gov/pubmed/34917514
http://dx.doi.org/10.3389/fonc.2021.797057
_version_ 1784614925144948736
author Zhuang, Jujuan
Ren, Changjing
Ren, Dan
Li, Yu’ang
Liu, Danyang
Cui, Lingyu
Tian, Geng
Yang, Jiasheng
Liu, Jingbo
author_facet Zhuang, Jujuan
Ren, Changjing
Ren, Dan
Li, Yu’ang
Liu, Danyang
Cui, Lingyu
Tian, Geng
Yang, Jiasheng
Liu, Jingbo
author_sort Zhuang, Jujuan
collection PubMed
description Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.
format Online
Article
Text
id pubmed-8670174
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86701742021-12-15 A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study Zhuang, Jujuan Ren, Changjing Ren, Dan Li, Yu’ang Liu, Danyang Cui, Lingyu Tian, Geng Yang, Jiasheng Liu, Jingbo Front Oncol Oncology Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters. Frontiers Media S.A. 2021-11-30 /pmc/articles/PMC8670174/ /pubmed/34917514 http://dx.doi.org/10.3389/fonc.2021.797057 Text en Copyright © 2021 Zhuang, Ren, Ren, Li, Liu, Cui, Tian, Yang and Liu https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Zhuang, Jujuan
Ren, Changjing
Ren, Dan
Li, Yu’ang
Liu, Danyang
Cui, Lingyu
Tian, Geng
Yang, Jiasheng
Liu, Jingbo
A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_full A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_fullStr A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_full_unstemmed A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_short A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study
title_sort novel single-cell rna sequencing data feature extraction method based on gene function analysis and its applications in glioma study
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670174/
https://www.ncbi.nlm.nih.gov/pubmed/34917514
http://dx.doi.org/10.3389/fonc.2021.797057
work_keys_str_mv AT zhuangjujuan anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT renchangjing anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT rendan anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liyuang anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liudanyang anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT cuilingyu anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT tiangeng anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT yangjiasheng anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liujingbo anovelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT zhuangjujuan novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT renchangjing novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT rendan novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liyuang novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liudanyang novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT cuilingyu novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT tiangeng novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT yangjiasheng novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy
AT liujingbo novelsinglecellrnasequencingdatafeatureextractionmethodbasedongenefunctionanalysisanditsapplicationsingliomastudy