Cargando…
Evaluation of single-cell classifiers for single-cell RNA sequencing data sets
Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional work...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947964/ https://www.ncbi.nlm.nih.gov/pubmed/31675098 http://dx.doi.org/10.1093/bib/bbz096 |
_version_ | 1783663337436872704 |
---|---|
author | Zhao, Xinlei Wu, Shuang Fang, Nan Sun, Xiao Fan, Jue |
author_facet | Zhao, Xinlei Wu, Shuang Fang, Nan Sun, Xiao Fan, Jue |
author_sort | Zhao, Xinlei |
collection | PubMed |
description | Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools. |
format | Online Article Text |
id | pubmed-7947964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-79479642021-03-16 Evaluation of single-cell classifiers for single-cell RNA sequencing data sets Zhao, Xinlei Wu, Shuang Fang, Nan Sun, Xiao Fan, Jue Brief Bioinform Review Article Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools. Oxford University Press 2019-10-23 /pmc/articles/PMC7947964/ /pubmed/31675098 http://dx.doi.org/10.1093/bib/bbz096 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Article Zhao, Xinlei Wu, Shuang Fang, Nan Sun, Xiao Fan, Jue Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title | Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title_full | Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title_fullStr | Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title_full_unstemmed | Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title_short | Evaluation of single-cell classifiers for single-cell RNA sequencing data sets |
title_sort | evaluation of single-cell classifiers for single-cell rna sequencing data sets |
topic | Review Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947964/ https://www.ncbi.nlm.nih.gov/pubmed/31675098 http://dx.doi.org/10.1093/bib/bbz096 |
work_keys_str_mv | AT zhaoxinlei evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets AT wushuang evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets AT fangnan evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets AT sunxiao evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets AT fanjue evaluationofsinglecellclassifiersforsinglecellrnasequencingdatasets |