Cargando…

Generalizing predictions to unseen sequencing profiles via deep generative models

Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challe...

Descripción completa

Detalles Bibliográficos
Autores principales: Oh, Min, Zhang, Liqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065080/
https://www.ncbi.nlm.nih.gov/pubmed/35504956
http://dx.doi.org/10.1038/s41598-022-11363-w
_version_ 1784699506376310784
author Oh, Min
Zhang, Liqing
author_facet Oh, Min
Zhang, Liqing
author_sort Oh, Min
collection PubMed
description Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis.
format Online
Article
Text
id pubmed-9065080
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-90650802022-05-04 Generalizing predictions to unseen sequencing profiles via deep generative models Oh, Min Zhang, Liqing Sci Rep Article Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis. Nature Publishing Group UK 2022-05-03 /pmc/articles/PMC9065080/ /pubmed/35504956 http://dx.doi.org/10.1038/s41598-022-11363-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Oh, Min
Zhang, Liqing
Generalizing predictions to unseen sequencing profiles via deep generative models
title Generalizing predictions to unseen sequencing profiles via deep generative models
title_full Generalizing predictions to unseen sequencing profiles via deep generative models
title_fullStr Generalizing predictions to unseen sequencing profiles via deep generative models
title_full_unstemmed Generalizing predictions to unseen sequencing profiles via deep generative models
title_short Generalizing predictions to unseen sequencing profiles via deep generative models
title_sort generalizing predictions to unseen sequencing profiles via deep generative models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065080/
https://www.ncbi.nlm.nih.gov/pubmed/35504956
http://dx.doi.org/10.1038/s41598-022-11363-w
work_keys_str_mv AT ohmin generalizingpredictionstounseensequencingprofilesviadeepgenerativemodels
AT zhangliqing generalizingpredictionstounseensequencingprofilesviadeepgenerativemodels