Cargando…

Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall st...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Hui, McCarthy, Davis J., Shim, Heejung, Wei, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635149/
https://www.ncbi.nlm.nih.gov/pubmed/36329399
http://dx.doi.org/10.1186/s12859-022-05003-3
_version_ 1784824646721339392
author Li, Hui
McCarthy, Davis J.
Shim, Heejung
Wei, Susan
author_facet Li, Hui
McCarthy, Davis J.
Shim, Heejung
Wei, Susan
author_sort Li, Hui
collection PubMed
description BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird’s eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. RESULTS: A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard maximum mean discrepancy measure. CONCLUSION: The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.
format Online
Article
Text
id pubmed-9635149
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96351492022-11-05 Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics Li, Hui McCarthy, Davis J. Shim, Heejung Wei, Susan BMC Bioinformatics Research BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird’s eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. RESULTS: A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard maximum mean discrepancy measure. CONCLUSION: The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect. BioMed Central 2022-11-03 /pmc/articles/PMC9635149/ /pubmed/36329399 http://dx.doi.org/10.1186/s12859-022-05003-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Li, Hui
McCarthy, Davis J.
Shim, Heejung
Wei, Susan
Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title_full Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title_fullStr Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title_full_unstemmed Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title_short Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
title_sort trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635149/
https://www.ncbi.nlm.nih.gov/pubmed/36329399
http://dx.doi.org/10.1186/s12859-022-05003-3
work_keys_str_mv AT lihui tradeoffbetweenconservationofbiologicalvariationandbatcheffectremovalindeepgenerativemodelingforsinglecelltranscriptomics
AT mccarthydavisj tradeoffbetweenconservationofbiologicalvariationandbatcheffectremovalindeepgenerativemodelingforsinglecelltranscriptomics
AT shimheejung tradeoffbetweenconservationofbiologicalvariationandbatcheffectremovalindeepgenerativemodelingforsinglecelltranscriptomics
AT weisusan tradeoffbetweenconservationofbiologicalvariationandbatcheffectremovalindeepgenerativemodelingforsinglecelltranscriptomics