Cargando…

Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited d...

Descripción completa

Detalles Bibliográficos
Autores principales: Diaz-Mejia, J. Javier, Meng, Elaine C., Pico, Alexander R., MacParland, Sonya A., Ketela, Troy, Pugh, Trevor J., Bader, Gary D., Morris, John H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6720041/
https://www.ncbi.nlm.nih.gov/pubmed/31508207
http://dx.doi.org/10.12688/f1000research.18490.3
_version_ 1783448038796165120
author Diaz-Mejia, J. Javier
Meng, Elaine C.
Pico, Alexander R.
MacParland, Sonya A.
Ketela, Troy
Pugh, Trevor J.
Bader, Gary D.
Morris, John H.
author_facet Diaz-Mejia, J. Javier
Meng, Elaine C.
Pico, Alexander R.
MacParland, Sonya A.
Ketela, Troy
Pugh, Trevor J.
Bader, Gary D.
Morris, John H.
author_sort Diaz-Mejia, J. Javier
collection PubMed
description Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.
format Online
Article
Text
id pubmed-6720041
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-67200412019-09-09 Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data Diaz-Mejia, J. Javier Meng, Elaine C. Pico, Alexander R. MacParland, Sonya A. Ketela, Troy Pugh, Trevor J. Bader, Gary D. Morris, John H. F1000Res Research Article Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling. F1000 Research Limited 2019-10-14 /pmc/articles/PMC6720041/ /pubmed/31508207 http://dx.doi.org/10.12688/f1000research.18490.3 Text en Copyright: © 2019 Diaz-Mejia JJ et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Diaz-Mejia, J. Javier
Meng, Elaine C.
Pico, Alexander R.
MacParland, Sonya A.
Ketela, Troy
Pugh, Trevor J.
Bader, Gary D.
Morris, John H.
Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title_full Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title_fullStr Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title_full_unstemmed Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title_short Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data
title_sort evaluation of methods to assign cell type labels to cell clusters from single-cell rna-sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6720041/
https://www.ncbi.nlm.nih.gov/pubmed/31508207
http://dx.doi.org/10.12688/f1000research.18490.3
work_keys_str_mv AT diazmejiajjavier evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT mengelainec evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT picoalexanderr evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT macparlandsonyaa evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT ketelatroy evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT pughtrevorj evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT badergaryd evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata
AT morrisjohnh evaluationofmethodstoassigncelltypelabelstocellclustersfromsinglecellrnasequencingdata