Cargando…
Generalizing predictions to unseen sequencing profiles via deep generative models
Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challe...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065080/ https://www.ncbi.nlm.nih.gov/pubmed/35504956 http://dx.doi.org/10.1038/s41598-022-11363-w |
_version_ | 1784699506376310784 |
---|---|
author | Oh, Min Zhang, Liqing |
author_facet | Oh, Min Zhang, Liqing |
author_sort | Oh, Min |
collection | PubMed |
description | Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis. |
format | Online Article Text |
id | pubmed-9065080 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-90650802022-05-04 Generalizing predictions to unseen sequencing profiles via deep generative models Oh, Min Zhang, Liqing Sci Rep Article Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis. Nature Publishing Group UK 2022-05-03 /pmc/articles/PMC9065080/ /pubmed/35504956 http://dx.doi.org/10.1038/s41598-022-11363-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Oh, Min Zhang, Liqing Generalizing predictions to unseen sequencing profiles via deep generative models |
title | Generalizing predictions to unseen sequencing profiles via deep generative models |
title_full | Generalizing predictions to unseen sequencing profiles via deep generative models |
title_fullStr | Generalizing predictions to unseen sequencing profiles via deep generative models |
title_full_unstemmed | Generalizing predictions to unseen sequencing profiles via deep generative models |
title_short | Generalizing predictions to unseen sequencing profiles via deep generative models |
title_sort | generalizing predictions to unseen sequencing profiles via deep generative models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065080/ https://www.ncbi.nlm.nih.gov/pubmed/35504956 http://dx.doi.org/10.1038/s41598-022-11363-w |
work_keys_str_mv | AT ohmin generalizingpredictionstounseensequencingprofilesviadeepgenerativemodels AT zhangliqing generalizingpredictionstounseensequencingprofilesviadeepgenerativemodels |