Cargando…
RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344557/ https://www.ncbi.nlm.nih.gov/pubmed/34320202 http://dx.doi.org/10.1093/nar/gkab632 |
_version_ | 1783734483008094208 |
---|---|
author | Schmidt, Florian Ranjan, Bobby Lin, Quy Xiao Xuan Krishnan, Vaidehi Joanito, Ignasius Honardoost, Mohammad Amin Nawaz, Zahid Venkatesh, Prasanna Nori Tan, Joanna Rayan, Nirmala Arul Ong, Sin Tiong Prabhakar, Shyam |
author_facet | Schmidt, Florian Ranjan, Bobby Lin, Quy Xiao Xuan Krishnan, Vaidehi Joanito, Ignasius Honardoost, Mohammad Amin Nawaz, Zahid Venkatesh, Prasanna Nori Tan, Joanna Rayan, Nirmala Arul Ong, Sin Tiong Prabhakar, Shyam |
author_sort | Schmidt, Florian |
collection | PubMed |
description | The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets. |
format | Online Article Text |
id | pubmed-8344557 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83445572021-08-10 RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data Schmidt, Florian Ranjan, Bobby Lin, Quy Xiao Xuan Krishnan, Vaidehi Joanito, Ignasius Honardoost, Mohammad Amin Nawaz, Zahid Venkatesh, Prasanna Nori Tan, Joanna Rayan, Nirmala Arul Ong, Sin Tiong Prabhakar, Shyam Nucleic Acids Res Computational Biology The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets. Oxford University Press 2021-07-28 /pmc/articles/PMC8344557/ /pubmed/34320202 http://dx.doi.org/10.1093/nar/gkab632 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Schmidt, Florian Ranjan, Bobby Lin, Quy Xiao Xuan Krishnan, Vaidehi Joanito, Ignasius Honardoost, Mohammad Amin Nawaz, Zahid Venkatesh, Prasanna Nori Tan, Joanna Rayan, Nirmala Arul Ong, Sin Tiong Prabhakar, Shyam RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title | RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title_full | RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title_fullStr | RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title_full_unstemmed | RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title_short | RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data |
title_sort | rca2: a scalable supervised clustering algorithm that reduces batch effects in scrna-seq data |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344557/ https://www.ncbi.nlm.nih.gov/pubmed/34320202 http://dx.doi.org/10.1093/nar/gkab632 |
work_keys_str_mv | AT schmidtflorian rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT ranjanbobby rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT linquyxiaoxuan rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT krishnanvaidehi rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT joanitoignasius rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT honardoostmohammadamin rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT nawazzahid rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT venkateshprasannanori rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT tanjoanna rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT rayannirmalaarul rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT ongsintiong rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata AT prabhakarshyam rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata |