Cargando…

deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors

It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN)...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Bin, Zhang, Tongda, Zhou, Ruilong, Jiang, Xiaosen, Yang, Huanming, Jin, Xin, Bai, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8383340/
https://www.ncbi.nlm.nih.gov/pubmed/34447413
http://dx.doi.org/10.3389/fgene.2021.708981
_version_ 1783741718514892800
author Zou, Bin
Zhang, Tongda
Zhou, Ruilong
Jiang, Xiaosen
Yang, Huanming
Jin, Xin
Bai, Yong
author_facet Zou, Bin
Zhang, Tongda
Zhou, Ruilong
Jiang, Xiaosen
Yang, Huanming
Jin, Xin
Bai, Yong
author_sort Zou, Bin
collection PubMed
description It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis.
format Online
Article
Text
id pubmed-8383340
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83833402021-08-25 deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors Zou, Bin Zhang, Tongda Zhou, Ruilong Jiang, Xiaosen Yang, Huanming Jin, Xin Bai, Yong Front Genet Genetics It is well recognized that batch effect in single-cell RNA sequencing (scRNA-seq) data remains a big challenge when integrating different datasets. Here, we proposed deepMNN, a novel deep learning-based method to correct batch effect in scRNA-seq data. We first searched mutual nearest neighbor (MNN) pairs across different batches in a principal component analysis (PCA) subspace. Subsequently, a batch correction network was constructed by stacking two residual blocks and further applied for the removal of batch effects. The loss function of deepMNN was defined as the sum of a batch loss and a weighted regularization loss. The batch loss was used to compute the distance between cells in MNN pairs in the PCA subspace, while the regularization loss was to make the output of the network similar to the input. The experiment results showed that deepMNN can successfully remove batch effects across datasets with identical cell types, datasets with non-identical cell types, datasets with multiple batches, and large-scale datasets as well. We compared the performance of deepMNN with state-of-the-art batch correction methods, including the widely used methods of Harmony, Scanorama, and Seurat V4 as well as the recently developed deep learning-based methods of MMD-ResNet and scGen. The results demonstrated that deepMNN achieved a better or comparable performance in terms of both qualitative analysis using uniform manifold approximation and projection (UMAP) plots and quantitative metrics such as batch and cell entropies, ARI F1 score, and ASW F1 score under various scenarios. Additionally, deepMNN allowed for integrating scRNA-seq datasets with multiple batches in one step. Furthermore, deepMNN ran much faster than the other methods for large-scale datasets. These characteristics of deepMNN made it have the potential to be a new choice for large-scale single-cell gene expression data analysis. Frontiers Media S.A. 2021-08-10 /pmc/articles/PMC8383340/ /pubmed/34447413 http://dx.doi.org/10.3389/fgene.2021.708981 Text en Copyright © 2021 Zou, Zhang, Zhou, Jiang, Yang, Jin and Bai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zou, Bin
Zhang, Tongda
Zhou, Ruilong
Jiang, Xiaosen
Yang, Huanming
Jin, Xin
Bai, Yong
deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title_full deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title_fullStr deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title_full_unstemmed deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title_short deepMNN: Deep Learning-Based Single-Cell RNA Sequencing Data Batch Correction Using Mutual Nearest Neighbors
title_sort deepmnn: deep learning-based single-cell rna sequencing data batch correction using mutual nearest neighbors
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8383340/
https://www.ncbi.nlm.nih.gov/pubmed/34447413
http://dx.doi.org/10.3389/fgene.2021.708981
work_keys_str_mv AT zoubin deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT zhangtongda deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT zhouruilong deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT jiangxiaosen deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT yanghuanming deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT jinxin deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors
AT baiyong deepmnndeeplearningbasedsinglecellrnasequencingdatabatchcorrectionusingmutualnearestneighbors