Cargando…
A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics
Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch ef...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494213/ https://www.ncbi.nlm.nih.gov/pubmed/34035047 http://dx.doi.org/10.1101/gr.271874.120 |
_version_ | 1784579263088820224 |
---|---|
author | Lakkis, Justin Wang, David Zhang, Yuanchao Hu, Gang Wang, Kui Pan, Huize Ungar, Lyle Reilly, Muredach P. Li, Xiangjie Li, Mingyao |
author_facet | Lakkis, Justin Wang, David Zhang, Yuanchao Hu, Gang Wang, Kui Pan, Huize Ungar, Lyle Reilly, Muredach P. Li, Xiangjie Li, Mingyao |
author_sort | Lakkis, Justin |
collection | PubMed |
description | Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies. |
format | Online Article Text |
id | pubmed-8494213 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84942132021-10-07 A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics Lakkis, Justin Wang, David Zhang, Yuanchao Hu, Gang Wang, Kui Pan, Huize Ungar, Lyle Reilly, Muredach P. Li, Xiangjie Li, Mingyao Genome Res Method Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies. Cold Spring Harbor Laboratory Press 2021-10 /pmc/articles/PMC8494213/ /pubmed/34035047 http://dx.doi.org/10.1101/gr.271874.120 Text en © 2021 Lakkis et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Method Lakkis, Justin Wang, David Zhang, Yuanchao Hu, Gang Wang, Kui Pan, Huize Ungar, Lyle Reilly, Muredach P. Li, Xiangjie Li, Mingyao A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title | A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title_full | A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title_fullStr | A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title_full_unstemmed | A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title_short | A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
title_sort | joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494213/ https://www.ncbi.nlm.nih.gov/pubmed/34035047 http://dx.doi.org/10.1101/gr.271874.120 |
work_keys_str_mv | AT lakkisjustin ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT wangdavid ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT zhangyuanchao ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT hugang ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT wangkui ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT panhuize ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT ungarlyle ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT reillymuredachp ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT lixiangjie ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT limingyao ajointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT lakkisjustin jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT wangdavid jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT zhangyuanchao jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT hugang jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT wangkui jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT panhuize jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT ungarlyle jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT reillymuredachp jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT lixiangjie jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics AT limingyao jointdeeplearningmodelenablessimultaneousbatcheffectcorrectiondenoisingandclusteringinsinglecelltranscriptomics |