Cargando…

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional work...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xinlei, Wu, Shuang, Fang, Nan, Sun, Xiao, Fan, Jue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947964/
https://www.ncbi.nlm.nih.gov/pubmed/31675098
http://dx.doi.org/10.1093/bib/bbz096
_version_ 1783663337436872704
author Zhao, Xinlei
Wu, Shuang
Fang, Nan
Sun, Xiao
Fan, Jue
author_facet Zhao, Xinlei
Wu, Shuang
Fang, Nan
Sun, Xiao
Fan, Jue
author_sort Zhao, Xinlei
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.
format Online
Article
Text
id pubmed-7947964
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-79479642021-03-16 Evaluation of single-cell classifiers for single-cell RNA sequencing data sets Zhao, Xinlei Wu, Shuang Fang, Nan Sun, Xiao Fan, Jue Brief Bioinform Review Article Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools. Oxford University Press 2019-10-23 /pmc/articles/PMC7947964/ /pubmed/31675098 http://dx.doi.org/10.1093/bib/bbz096 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review Article
Zhao, Xinlei
Wu, Shuang
Fang, Nan
Sun, Xiao
Fan, Jue
Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title_full Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title_fullStr Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title_full_unstemmed Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title_short Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
title_sort evaluation of single-cell classifiers for single-cell rna sequencing data sets
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947964/
https://www.ncbi.nlm.nih.gov/pubmed/31675098
http://dx.doi.org/10.1093/bib/bbz096
work_keys_str_mv AT zhaoxinlei evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets
AT wushuang evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets
AT fangnan evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets
AT sunxiao evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets
AT fanjue evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets