Cargando…

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BACKGROUND: Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Ranjan, Bobby, Schmidt, Florian, Sun, Wenjie, Park, Jinyu, Honardoost, Mohammad Amin, Tan, Joanna, Arul Rayan, Nirmala, Prabhakar, Shyam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042883/
https://www.ncbi.nlm.nih.gov/pubmed/33845760
http://dx.doi.org/10.1186/s12859-021-04028-4
_version_ 1783678209107165184
author Ranjan, Bobby
Schmidt, Florian
Sun, Wenjie
Park, Jinyu
Honardoost, Mohammad Amin
Tan, Joanna
Arul Rayan, Nirmala
Prabhakar, Shyam
author_facet Ranjan, Bobby
Schmidt, Florian
Sun, Wenjie
Park, Jinyu
Honardoost, Mohammad Amin
Tan, Joanna
Arul Rayan, Nirmala
Prabhakar, Shyam
author_sort Ranjan, Bobby
collection PubMed
description BACKGROUND: Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. RESULTS: We present scConsensus, an [Formula: see text] framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. CONCLUSIONS: scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in [Formula: see text] and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04028-4.
format Online
Article
Text
id pubmed-8042883
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80428832021-04-14 scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data Ranjan, Bobby Schmidt, Florian Sun, Wenjie Park, Jinyu Honardoost, Mohammad Amin Tan, Joanna Arul Rayan, Nirmala Prabhakar, Shyam BMC Bioinformatics Methodology Article BACKGROUND: Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. RESULTS: We present scConsensus, an [Formula: see text] framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. CONCLUSIONS: scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in [Formula: see text] and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04028-4. BioMed Central 2021-04-12 /pmc/articles/PMC8042883/ /pubmed/33845760 http://dx.doi.org/10.1186/s12859-021-04028-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Ranjan, Bobby
Schmidt, Florian
Sun, Wenjie
Park, Jinyu
Honardoost, Mohammad Amin
Tan, Joanna
Arul Rayan, Nirmala
Prabhakar, Shyam
scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_full scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_fullStr scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_full_unstemmed scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_short scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data
title_sort scconsensus: combining supervised and unsupervised clustering for cell type identification in single-cell rna sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8042883/
https://www.ncbi.nlm.nih.gov/pubmed/33845760
http://dx.doi.org/10.1186/s12859-021-04028-4
work_keys_str_mv AT ranjanbobby scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT schmidtflorian scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT sunwenjie scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT parkjinyu scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT honardoostmohammadamin scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT tanjoanna scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT arulrayannirmala scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata
AT prabhakarshyam scconsensuscombiningsupervisedandunsupervisedclusteringforcelltypeidentificationinsinglecellrnasequencingdata