Cargando…

Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Wenjing, Su, Kenong, Wu, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427961/
https://www.ncbi.nlm.nih.gov/pubmed/34503564
http://dx.doi.org/10.1186/s13059-021-02480-2
_version_ 1783750280802729984
author Ma, Wenjing
Su, Kenong
Wu, Hao
author_facet Ma, Wenjing
Su, Kenong
Wu, Hao
author_sort Ma, Wenjing
collection PubMed
description BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. RESULTS: In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. CONCLUSIONS: Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02480-2.
format Online
Article
Text
id pubmed-8427961
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84279612021-09-10 Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction Ma, Wenjing Su, Kenong Wu, Hao Genome Biol Research BACKGROUND: Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. RESULTS: In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. CONCLUSIONS: Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02480-2. BioMed Central 2021-09-09 /pmc/articles/PMC8427961/ /pubmed/34503564 http://dx.doi.org/10.1186/s13059-021-02480-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Wenjing
Su, Kenong
Wu, Hao
Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_fullStr Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full_unstemmed Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_short Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_sort evaluation of some aspects in supervised cell type identification for single-cell rna-seq: classifier, feature selection, and reference construction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427961/
https://www.ncbi.nlm.nih.gov/pubmed/34503564
http://dx.doi.org/10.1186/s13059-021-02480-2
work_keys_str_mv AT mawenjing evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction
AT sukenong evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction
AT wuhao evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction