Cargando…

RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data

The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmidt, Florian, Ranjan, Bobby, Lin, Quy Xiao Xuan, Krishnan, Vaidehi, Joanito, Ignasius, Honardoost, Mohammad Amin, Nawaz, Zahid, Venkatesh, Prasanna Nori, Tan, Joanna, Rayan, Nirmala Arul, Ong, Sin Tiong, Prabhakar, Shyam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344557/
https://www.ncbi.nlm.nih.gov/pubmed/34320202
http://dx.doi.org/10.1093/nar/gkab632
_version_ 1783734483008094208
author Schmidt, Florian
Ranjan, Bobby
Lin, Quy Xiao Xuan
Krishnan, Vaidehi
Joanito, Ignasius
Honardoost, Mohammad Amin
Nawaz, Zahid
Venkatesh, Prasanna Nori
Tan, Joanna
Rayan, Nirmala Arul
Ong, Sin Tiong
Prabhakar, Shyam
author_facet Schmidt, Florian
Ranjan, Bobby
Lin, Quy Xiao Xuan
Krishnan, Vaidehi
Joanito, Ignasius
Honardoost, Mohammad Amin
Nawaz, Zahid
Venkatesh, Prasanna Nori
Tan, Joanna
Rayan, Nirmala Arul
Ong, Sin Tiong
Prabhakar, Shyam
author_sort Schmidt, Florian
collection PubMed
description The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets.
format Online
Article
Text
id pubmed-8344557
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83445572021-08-10 RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data Schmidt, Florian Ranjan, Bobby Lin, Quy Xiao Xuan Krishnan, Vaidehi Joanito, Ignasius Honardoost, Mohammad Amin Nawaz, Zahid Venkatesh, Prasanna Nori Tan, Joanna Rayan, Nirmala Arul Ong, Sin Tiong Prabhakar, Shyam Nucleic Acids Res Computational Biology The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets. Oxford University Press 2021-07-28 /pmc/articles/PMC8344557/ /pubmed/34320202 http://dx.doi.org/10.1093/nar/gkab632 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Schmidt, Florian
Ranjan, Bobby
Lin, Quy Xiao Xuan
Krishnan, Vaidehi
Joanito, Ignasius
Honardoost, Mohammad Amin
Nawaz, Zahid
Venkatesh, Prasanna Nori
Tan, Joanna
Rayan, Nirmala Arul
Ong, Sin Tiong
Prabhakar, Shyam
RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title_full RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title_fullStr RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title_full_unstemmed RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title_short RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data
title_sort rca2: a scalable supervised clustering algorithm that reduces batch effects in scrna-seq data
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344557/
https://www.ncbi.nlm.nih.gov/pubmed/34320202
http://dx.doi.org/10.1093/nar/gkab632
work_keys_str_mv AT schmidtflorian rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT ranjanbobby rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT linquyxiaoxuan rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT krishnanvaidehi rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT joanitoignasius rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT honardoostmohammadamin rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT nawazzahid rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT venkateshprasannanori rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT tanjoanna rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT rayannirmalaarul rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT ongsintiong rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata
AT prabhakarshyam rca2ascalablesupervisedclusteringalgorithmthatreducesbatcheffectsinscrnaseqdata