Cargando…
deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN)...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8383340/ https://www.ncbi.nlm.nih.gov/pubmed/34447413 http://dx.doi.org/10.3389/fgene.2021.708981 |
_version_ | 1783741718514892800 |
---|---|
author | Zou, Bin Zhang, Tongda Zhou, Ruilong Jiang, Xiaosen Yang, Huanming Jin, Xin Bai, Yong |
author_facet | Zou, Bin Zhang, Tongda Zhou, Ruilong Jiang, Xiaosen Yang, Huanming Jin, Xin Bai, Yong |
author_sort | Zou, Bin |
collection | PubMed |
description | It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis. |
format | Online Article Text |
id | pubmed-8383340 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-83833402021-08-25 deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors Zou, Bin Zhang, Tongda Zhou, Ruilong Jiang, Xiaosen Yang, Huanming Jin, Xin Bai, Yong Front Genet Genetics It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis. Frontiers Media S.A. 2021-08-10 /pmc/articles/PMC8383340/ /pubmed/34447413 http://dx.doi.org/10.3389/fgene.2021.708981 Text en Copyright © 2021 Zou, Zhang, Zhou, Jiang, Yang, Jin and Bai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Zou, Bin Zhang, Tongda Zhou, Ruilong Jiang, Xiaosen Yang, Huanming Jin, Xin Bai, Yong deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title | deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title_full | deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title_fullStr | deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title_full_unstemmed | deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title_short | deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors |
title_sort | deepmnn: deep learning-based single-cell rna sequencing data batch correction using mutual nearest neighbors |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8383340/ https://www.ncbi.nlm.nih.gov/pubmed/34447413 http://dx.doi.org/10.3389/fgene.2021.708981 |
work_keys_str_mv | AT zoubin deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT zhangtongda deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT zhouruilong deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT jiangxiaosen deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT yanghuanming deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT jinxin deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors AT baiyong deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors |