Cargando…
CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq
The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is howev...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076908/ https://www.ncbi.nlm.nih.gov/pubmed/33927748 http://dx.doi.org/10.3389/fgene.2021.644211 |
_version_ | 1783684784921247744 |
---|---|
author | Yu, Wenbo Mahfouz, Ahmed Reinders, Marcel J. T. |
author_facet | Yu, Wenbo Mahfouz, Ahmed Reinders, Marcel J. T. |
author_sort | Yu, Wenbo |
collection | PubMed |
description | The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters. |
format | Online Article Text |
id | pubmed-8076908 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80769082021-04-28 CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq Yu, Wenbo Mahfouz, Ahmed Reinders, Marcel J. T. Front Genet Genetics The power of single-cell RNA sequencing (scRNA-seq) in detecting cell heterogeneity or developmental process is becoming more and more evident every day. The granularity of this knowledge is further propelled when combining two batches of scRNA-seq into a single large dataset. This strategy is however hampered by technical differences between these batches. Typically, these batch effects are resolved by matching similar cells across the different batches. Current approaches, however, do not take into account that we can constrain this matching further as cells can also be matched on their cell type identity. We use an auto-encoder to embed two batches in the same space such that cells are matched. To accomplish this, we use a loss function that preserves: (1) cell-cell distances within each of the two batches, as well as (2) cell-cell distances between two batches when the cells are of the same cell-type. The cell-type guidance is unsupervised, i.e., a cell-type is defined as a cluster in the original batch. We evaluated the performance of our cluster-guided batch alignment (CBA) using pancreas and mouse cell atlas datasets, against six state-of-the-art single cell alignment methods: Seurat v3, BBKNN, Scanorama, Harmony, LIGER, and BERMUDA. Compared to other approaches, CBA preserves the cluster separation in the original datasets while still being able to align the two datasets. We confirm that this separation is biologically meaningful by identifying relevant differential expression of genes for these preserved clusters. Frontiers Media S.A. 2021-04-13 /pmc/articles/PMC8076908/ /pubmed/33927748 http://dx.doi.org/10.3389/fgene.2021.644211 Text en Copyright © 2021 Yu, Mahfouz and Reinders. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Yu, Wenbo Mahfouz, Ahmed Reinders, Marcel J. T. CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title | CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title_full | CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title_fullStr | CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title_full_unstemmed | CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title_short | CBA: Cluster-Guided Batch Alignment for Single Cell RNA-seq |
title_sort | cba: cluster-guided batch alignment for single cell rna-seq |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076908/ https://www.ncbi.nlm.nih.gov/pubmed/33927748 http://dx.doi.org/10.3389/fgene.2021.644211 |
work_keys_str_mv | AT yuwenbo cbaclusterguidedbatchalignmentforsinglecellrnaseq AT mahfouzahmed cbaclusterguidedbatchalignmentforsinglecellrnaseq AT reindersmarceljt cbaclusterguidedbatchalignmentforsinglecellrnaseq |