Cargando…
Evaluation of single-cell RNAseq labelling algorithms using cancer datasets
Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high comple...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851326/ https://www.ncbi.nlm.nih.gov/pubmed/36585784 http://dx.doi.org/10.1093/bib/bbac561 |
_version_ | 1784872371562217472 |
---|---|
author | Christensen, Erik Luo, Ping Turinsky, Andrei Husić, Mia Mahalanabis, Alaina Naidas, Alaine Diaz-Mejia, Juan Javier Brudno, Michael Pugh, Trevor Ramani, Arun Shooshtari, Parisa |
author_facet | Christensen, Erik Luo, Ping Turinsky, Andrei Husić, Mia Mahalanabis, Alaina Naidas, Alaine Diaz-Mejia, Juan Javier Brudno, Michael Pugh, Trevor Ramani, Arun Shooshtari, Parisa |
author_sort | Christensen, Erik |
collection | PubMed |
description | Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high complexity of cancer poses a unique challenge, as tumor microenvironments are often composed of diverse cell subpopulations with unique functional effects that may lead to disease progression, metastasis and treatment resistance. Here, we assess 17 cell-based and 9 cluster-based scRNA-seq labelling algorithms using 8 cancer datasets, providing a comprehensive large-scale assessment of such methods in a cancer-specific context. Using several performance metrics, we show that cell-based methods generally achieved higher performance and were faster compared to cluster-based methods. Cluster-based methods more successfully labelled non-malignant cell types, likely because of a lack of gene signatures for relevant malignant cell subpopulations. Larger cell numbers present in some cell types in training data positively impacted prediction scores for cell-based methods. Finally, we examined which methods performed favorably when trained and tested on separate patient cohorts in scenarios similar to clinical applications, and which were able to accurately label particularly small or under-represented cell populations in the given datasets. We conclude that scPred and SVM show the best overall performances with cancer-specific data and provide further suggestions for algorithm selection. Our analysis pipeline for assessing the performance of cell type labelling algorithms is available in https://github.com/shooshtarilab/scRNAseq-Automated-Cell-Type-Labelling. |
format | Online Article Text |
id | pubmed-9851326 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98513262023-01-20 Evaluation of single-cell RNAseq labelling algorithms using cancer datasets Christensen, Erik Luo, Ping Turinsky, Andrei Husić, Mia Mahalanabis, Alaina Naidas, Alaine Diaz-Mejia, Juan Javier Brudno, Michael Pugh, Trevor Ramani, Arun Shooshtari, Parisa Brief Bioinform Review Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high complexity of cancer poses a unique challenge, as tumor microenvironments are often composed of diverse cell subpopulations with unique functional effects that may lead to disease progression, metastasis and treatment resistance. Here, we assess 17 cell-based and 9 cluster-based scRNA-seq labelling algorithms using 8 cancer datasets, providing a comprehensive large-scale assessment of such methods in a cancer-specific context. Using several performance metrics, we show that cell-based methods generally achieved higher performance and were faster compared to cluster-based methods. Cluster-based methods more successfully labelled non-malignant cell types, likely because of a lack of gene signatures for relevant malignant cell subpopulations. Larger cell numbers present in some cell types in training data positively impacted prediction scores for cell-based methods. Finally, we examined which methods performed favorably when trained and tested on separate patient cohorts in scenarios similar to clinical applications, and which were able to accurately label particularly small or under-represented cell populations in the given datasets. We conclude that scPred and SVM show the best overall performances with cancer-specific data and provide further suggestions for algorithm selection. Our analysis pipeline for assessing the performance of cell type labelling algorithms is available in https://github.com/shooshtarilab/scRNAseq-Automated-Cell-Type-Labelling. Oxford University Press 2022-12-30 /pmc/articles/PMC9851326/ /pubmed/36585784 http://dx.doi.org/10.1093/bib/bbac561 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Christensen, Erik Luo, Ping Turinsky, Andrei Husić, Mia Mahalanabis, Alaina Naidas, Alaine Diaz-Mejia, Juan Javier Brudno, Michael Pugh, Trevor Ramani, Arun Shooshtari, Parisa Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title | Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title_full | Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title_fullStr | Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title_full_unstemmed | Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title_short | Evaluation of single-cell RNAseq labelling algorithms using cancer datasets |
title_sort | evaluation of single-cell rnaseq labelling algorithms using cancer datasets |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851326/ https://www.ncbi.nlm.nih.gov/pubmed/36585784 http://dx.doi.org/10.1093/bib/bbac561 |
work_keys_str_mv | AT christensenerik evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT luoping evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT turinskyandrei evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT husicmia evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT mahalanabisalaina evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT naidasalaine evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT diazmejiajuanjavier evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT brudnomichael evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT pughtrevor evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT ramaniarun evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets AT shooshtariparisa evaluationofsinglecellrnaseqlabellingalgorithmsusingcancerdatasets |