Cargando…

CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity

MOTIVATION: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Xuhua, Zheng, Ruiqing, Wu, Fangxiang, Li, Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985174/
https://www.ncbi.nlm.nih.gov/pubmed/36821425
http://dx.doi.org/10.1093/bioinformatics/btad099
_version_ 1784900896622116864
author Yan, Xuhua
Zheng, Ruiqing
Wu, Fangxiang
Li, Min
author_facet Yan, Xuhua
Zheng, Ruiqing
Wu, Fangxiang
Li, Min
author_sort Yan, Xuhua
collection PubMed
description MOTIVATION: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS: We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION: The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9985174
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99851742023-03-05 CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity Yan, Xuhua Zheng, Ruiqing Wu, Fangxiang Li, Min Bioinformatics Original Paper MOTIVATION: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS: We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION: The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-02-23 /pmc/articles/PMC9985174/ /pubmed/36821425 http://dx.doi.org/10.1093/bioinformatics/btad099 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Yan, Xuhua
Zheng, Ruiqing
Wu, Fangxiang
Li, Min
CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title_full CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title_fullStr CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title_full_unstemmed CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title_short CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
title_sort claire: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985174/
https://www.ncbi.nlm.nih.gov/pubmed/36821425
http://dx.doi.org/10.1093/bioinformatics/btad099
work_keys_str_mv AT yanxuhua clairecontrastivelearningbasedbatchcorrectionframeworkforbetterbalancebetweenbatchmixingandpreservationofcellularheterogeneity
AT zhengruiqing clairecontrastivelearningbasedbatchcorrectionframeworkforbetterbalancebetweenbatchmixingandpreservationofcellularheterogeneity
AT wufangxiang clairecontrastivelearningbasedbatchcorrectionframeworkforbetterbalancebetweenbatchmixingandpreservationofcellularheterogeneity
AT limin clairecontrastivelearningbasedbatchcorrectionframeworkforbetterbalancebetweenbatchmixingandpreservationofcellularheterogeneity