Cargando…

Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †

With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the obs...

Descripción completa

Detalles Bibliográficos
Autores principales: Vasighizaker, Akram, Trivedi, Yash, Rueda, Luis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048047/
https://www.ncbi.nlm.nih.gov/pubmed/36980868
http://dx.doi.org/10.3390/genes14030596
_version_ 1785014082797043712
author Vasighizaker, Akram
Trivedi, Yash
Rueda, Luis
author_facet Vasighizaker, Akram
Trivedi, Yash
Rueda, Luis
author_sort Vasighizaker, Akram
collection PubMed
description With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great performance of clustering methods, manual annotation of the clusters of cells is a challenge yet to be addressed more scalable and faster. On the other hand, due to the lack of enough labelled datasets, just a few supervised techniques have been used in cell type identification, and they obtained more robust results compared to clustering methods. A recent study showed that a complementary step of feature selection helped support vector machine (SVM) to outperform other classifiers in different scenarios. In this article, we compare and evaluate the performance of two state-of-the-art supervised methods, XGBoost and SVM, with information gain as a feature selection method. The results of the experiments on three standard scRNA-seq datasets indicate that XGBoost automatically annotates cell types in a simpler and more scalable framework. Additionally, it sheds light on the potential use of boosting tree approaches combined with deep neural networks to capture underlying information of single-cell RNA-Seq data more effectively. It can be used to identify marker genes and other applications in biological studies.
format Online
Article
Text
id pubmed-10048047
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100480472023-03-29 Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data † Vasighizaker, Akram Trivedi, Yash Rueda, Luis Genes (Basel) Article With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great performance of clustering methods, manual annotation of the clusters of cells is a challenge yet to be addressed more scalable and faster. On the other hand, due to the lack of enough labelled datasets, just a few supervised techniques have been used in cell type identification, and they obtained more robust results compared to clustering methods. A recent study showed that a complementary step of feature selection helped support vector machine (SVM) to outperform other classifiers in different scenarios. In this article, we compare and evaluate the performance of two state-of-the-art supervised methods, XGBoost and SVM, with information gain as a feature selection method. The results of the experiments on three standard scRNA-seq datasets indicate that XGBoost automatically annotates cell types in a simpler and more scalable framework. Additionally, it sheds light on the potential use of boosting tree approaches combined with deep neural networks to capture underlying information of single-cell RNA-Seq data more effectively. It can be used to identify marker genes and other applications in biological studies. MDPI 2023-02-26 /pmc/articles/PMC10048047/ /pubmed/36980868 http://dx.doi.org/10.3390/genes14030596 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vasighizaker, Akram
Trivedi, Yash
Rueda, Luis
Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title_full Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title_fullStr Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title_full_unstemmed Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title_short Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data †
title_sort cell type annotation model selection: general-purpose vs. pattern-aware feature gene selection in single-cell rna-seq data †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048047/
https://www.ncbi.nlm.nih.gov/pubmed/36980868
http://dx.doi.org/10.3390/genes14030596
work_keys_str_mv AT vasighizakerakram celltypeannotationmodelselectiongeneralpurposevspatternawarefeaturegeneselectioninsinglecellrnaseqdata
AT trivediyash celltypeannotationmodelselectiongeneralpurposevspatternawarefeaturegeneselectioninsinglecellrnaseqdata
AT ruedaluis celltypeannotationmodelselectiongeneralpurposevspatternawarefeaturegeneselectioninsinglecellrnaseqdata