Cargando…
Integrating single-cell RNA-seq datasets with substantial batch effects
Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635119/ https://www.ncbi.nlm.nih.gov/pubmed/37961672 http://dx.doi.org/10.1101/2023.11.03.565463 |
_version_ | 1785146291212255232 |
---|---|
author | Hrovatin, Karin Moinfar, Amir Ali Lapuerta, Alejandro Tejada Zappia, Luke Lengerich, Ben Kellis, Manolis Theis, Fabian J. |
author_facet | Hrovatin, Karin Moinfar, Amir Ali Lapuerta, Alejandro Tejada Zappia, Luke Lengerich, Ben Kellis, Manolis Theis, Fabian J. |
author_sort | Hrovatin, Karin |
collection | PubMed |
description | Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), we hypothesize that machine learning interventions to standard cVAEs can help improve batch effect removal while potentially preserving biological variation more effectively. To address this, we assess four strategies applied to commonly used cVAE models: the previously proposed Kullback–Leibler divergence (KL) regularization tuning and adversarial learning, as well as cycle-consistency loss (previously applied to multi-omic integration) and the multimodal variational mixture of posteriors prior (VampPrior) that has not yet been applied to integration. We evaluated performance in three data settings, namely cross-species, organoid-tissue, and cell-nuclei integration. Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance. While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches. KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions. Based on our findings, we recommend the adoption of the VampPrior in combination with the cycle-consistency loss for integrating datasets with substantial batch effects. |
format | Online Article Text |
id | pubmed-10635119 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106351192023-11-13 Integrating single-cell RNA-seq datasets with substantial batch effects Hrovatin, Karin Moinfar, Amir Ali Lapuerta, Alejandro Tejada Zappia, Luke Lengerich, Ben Kellis, Manolis Theis, Fabian J. bioRxiv Article Computational methods for integrating scRNA-seq datasets often struggle to harmonize datasets with substantial differences driven by technical or biological variation, such as between different species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Given that many widely adopted and scalable methods are based on conditional variational autoencoders (cVAE), we hypothesize that machine learning interventions to standard cVAEs can help improve batch effect removal while potentially preserving biological variation more effectively. To address this, we assess four strategies applied to commonly used cVAE models: the previously proposed Kullback–Leibler divergence (KL) regularization tuning and adversarial learning, as well as cycle-consistency loss (previously applied to multi-omic integration) and the multimodal variational mixture of posteriors prior (VampPrior) that has not yet been applied to integration. We evaluated performance in three data settings, namely cross-species, organoid-tissue, and cell-nuclei integration. Cycle-consistency and VampPrior improved batch correction while retaining high biological preservation, with their combination further increasing performance. While adversarial learning led to the strongest batch correction, its preservation of within-cell type variation did not match that of VampPrior or cycle-consistency models, and it was also prone to mixing unrelated cell types with different proportions across batches. KL regularization strength tuning had the least favorable performance, as it jointly removed biological and batch variation by reducing the number of effectively used embedding dimensions. Based on our findings, we recommend the adoption of the VampPrior in combination with the cycle-consistency loss for integrating datasets with substantial batch effects. Cold Spring Harbor Laboratory 2023-11-05 /pmc/articles/PMC10635119/ /pubmed/37961672 http://dx.doi.org/10.1101/2023.11.03.565463 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Hrovatin, Karin Moinfar, Amir Ali Lapuerta, Alejandro Tejada Zappia, Luke Lengerich, Ben Kellis, Manolis Theis, Fabian J. Integrating single-cell RNA-seq datasets with substantial batch effects |
title | Integrating single-cell RNA-seq datasets with substantial batch effects |
title_full | Integrating single-cell RNA-seq datasets with substantial batch effects |
title_fullStr | Integrating single-cell RNA-seq datasets with substantial batch effects |
title_full_unstemmed | Integrating single-cell RNA-seq datasets with substantial batch effects |
title_short | Integrating single-cell RNA-seq datasets with substantial batch effects |
title_sort | integrating single-cell rna-seq datasets with substantial batch effects |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635119/ https://www.ncbi.nlm.nih.gov/pubmed/37961672 http://dx.doi.org/10.1101/2023.11.03.565463 |
work_keys_str_mv | AT hrovatinkarin integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT moinfaramirali integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT lapuertaalejandrotejada integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT zappialuke integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT lengerichben integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT kellismanolis integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects AT theisfabianj integratingsinglecellrnaseqdatasetswithsubstantialbatcheffects |