Cargando…

Integrating single-cell RNA-seq datasets with substantial batch effects

Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-...

Descripción completa

Detalles Bibliográficos
Autores principales: Hrovatin, Karin, Moinfar, Amir Ali, Lapuerta, Alejandro Tejada, Zappia, Luke, Lengerich, Ben, Kellis, Manolis, Theis, Fabian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635119/
https://www.ncbi.nlm.nih.gov/pubmed/37961672
http://dx.doi.org/10.1101/2023.11.03.565463
_version_ 1785146291212255232
author Hrovatin, Karin
Moinfar, Amir Ali
Lapuerta, Alejandro Tejada
Zappia, Luke
Lengerich, Ben
Kellis, Manolis
Theis, Fabian J.
author_facet Hrovatin, Karin
Moinfar, Amir Ali
Lapuerta, Alejandro Tejada
Zappia, Luke
Lengerich, Ben
Kellis, Manolis
Theis, Fabian J.
author_sort Hrovatin, Karin
collection PubMed
description Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), we hypothesize that machine learning interventions to standard cVAEs can help improve batch effect removal while potentially preserving biological variation more effectively. To address this, we assess four strategies applied to commonly used cVAE models: the previously proposed Kullback–Leibler divergence (KL) regularization tuning and adversarial learning, as well as cycle-consistency loss (previously applied to multi-omic integration) and the multimodal variational mixture of posteriors prior (VampPrior) that has not yet been applied to integration. We evaluated performance in three data settings, namely cross-species, organoid-tissue, and cell-nuclei integration. Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance. While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches. KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions. Based on our findings, we recommend the adoption of the VampPrior in combination with the cycle-consistency loss for integrating datasets with substantial batch effects.
format Online
Article
Text
id pubmed-10635119
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106351192023-11-13 Integrating single-cell RNA-seq datasets with substantial batch effects Hrovatin, Karin Moinfar, Amir Ali Lapuerta, Alejandro Tejada Zappia, Luke Lengerich, Ben Kellis, Manolis Theis, Fabian J. bioRxiv Article Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), we hypothesize that machine learning interventions to standard cVAEs can help improve batch effect removal while potentially preserving biological variation more effectively. To address this, we assess four strategies applied to commonly used cVAE models: the previously proposed Kullback–Leibler divergence (KL) regularization tuning and adversarial learning, as well as cycle-consistency loss (previously applied to multi-omic integration) and the multimodal variational mixture of posteriors prior (VampPrior) that has not yet been applied to integration. We evaluated performance in three data settings, namely cross-species, organoid-tissue, and cell-nuclei integration. Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance. While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches. KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions. Based on our findings, we recommend the adoption of the VampPrior in combination with the cycle-consistency loss for integrating datasets with substantial batch effects. Cold Spring Harbor Laboratory 2023-11-05 /pmc/articles/PMC10635119/ /pubmed/37961672 http://dx.doi.org/10.1101/2023.11.03.565463 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Hrovatin, Karin
Moinfar, Amir Ali
Lapuerta, Alejandro Tejada
Zappia, Luke
Lengerich, Ben
Kellis, Manolis
Theis, Fabian J.
Integrating single-cell RNA-seq datasets with substantial batch effects
title Integrating single-cell RNA-seq datasets with substantial batch effects
title_full Integrating single-cell RNA-seq datasets with substantial batch effects
title_fullStr Integrating single-cell RNA-seq datasets with substantial batch effects
title_full_unstemmed Integrating single-cell RNA-seq datasets with substantial batch effects
title_short Integrating single-cell RNA-seq datasets with substantial batch effects
title_sort integrating single-cell rna-seq datasets with substantial batch effects
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635119/
https://www.ncbi.nlm.nih.gov/pubmed/37961672
http://dx.doi.org/10.1101/2023.11.03.565463
work_keys_str_mv AT hrovatinkarin integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT moinfaramirali integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT lapuertaalejandrotejada integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT zappialuke integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT lengerichben integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT kellismanolis integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects
AT theisfabianj integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects