Cargando…

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ma, Wenjing, Su, Kenong, Wu, Hao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427961/ https://www.ncbi.nlm.nih.gov/pubmed/34503564 http://dx.doi.org/10.1186/s13059-021-02480-2

_version_	1783750280802729984
author	Ma, Wenjing Su, Kenong Wu, Hao
author_facet	Ma, Wenjing Su, Kenong Wu, Hao
author_sort	Ma, Wenjing
collection	PubMed
description	BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. RESULTS: In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. CONCLUSIONS: Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02480-2.
format	Online Article Text
id	pubmed-8427961
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-84279612021-09-10 Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction Ma, Wenjing Su, Kenong Wu, Hao Genome Biol Research BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. RESULTS: In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. CONCLUSIONS: Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02480-2. BioMed Central 2021-09-09 /pmc/articles/PMC8427961/ /pubmed/34503564 http://dx.doi.org/10.1186/s13059-021-02480-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Ma, Wenjing Su, Kenong Wu, Hao Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title	Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full	Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_fullStr	Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full_unstemmed	Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_short	Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_sort	evaluation of some aspects in supervised cell type identification for single-cell rna-seq: classifier, feature selection, and reference construction
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427961/ https://www.ncbi.nlm.nih.gov/pubmed/34503564 http://dx.doi.org/10.1186/s13059-021-02480-2
work_keys_str_mv	AT mawenjing evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction AT sukenong evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction AT wuhao evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

Ejemplares similares