Cargando…

A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics

Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch ef...

Descripción completa

Detalles Bibliográficos
Autores principales: Lakkis, Justin, Wang, David, Zhang, Yuanchao, Hu, Gang, Wang, Kui, Pan, Huize, Ungar, Lyle, Reilly, Muredach P., Li, Xiangjie, Li, Mingyao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494213/
https://www.ncbi.nlm.nih.gov/pubmed/34035047
http://dx.doi.org/10.1101/gr.271874.120
_version_ 1784579263088820224
author Lakkis, Justin
Wang, David
Zhang, Yuanchao
Hu, Gang
Wang, Kui
Pan, Huize
Ungar, Lyle
Reilly, Muredach P.
Li, Xiangjie
Li, Mingyao
author_facet Lakkis, Justin
Wang, David
Zhang, Yuanchao
Hu, Gang
Wang, Kui
Pan, Huize
Ungar, Lyle
Reilly, Muredach P.
Li, Xiangjie
Li, Mingyao
author_sort Lakkis, Justin
collection PubMed
description Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.
format Online
Article
Text
id pubmed-8494213
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-84942132021-10-07 A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics Lakkis, Justin Wang, David Zhang, Yuanchao Hu, Gang Wang, Kui Pan, Huize Ungar, Lyle Reilly, Muredach P. Li, Xiangjie Li, Mingyao Genome Res Method Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies. Cold Spring Harbor Laboratory Press 2021-10 /pmc/articles/PMC8494213/ /pubmed/34035047 http://dx.doi.org/10.1101/gr.271874.120 Text en © 2021 Lakkis et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Method
Lakkis, Justin
Wang, David
Zhang, Yuanchao
Hu, Gang
Wang, Kui
Pan, Huize
Ungar, Lyle
Reilly, Muredach P.
Li, Xiangjie
Li, Mingyao
A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title_full A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title_fullStr A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title_full_unstemmed A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title_short A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
title_sort joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494213/
https://www.ncbi.nlm.nih.gov/pubmed/34035047
http://dx.doi.org/10.1101/gr.271874.120
work_keys_str_mv AT lakkisjustin ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT wangdavid ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT zhangyuanchao ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT hugang ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT wangkui ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT panhuize ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT ungarlyle ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT reillymuredachp ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT lixiangjie ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT limingyao ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT lakkisjustin jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT wangdavid jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT zhangyuanchao jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT hugang jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT wangkui jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT panhuize jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT ungarlyle jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT reillymuredachp jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT lixiangjie jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics
AT limingyao jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics