Cargando…
Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612879/ https://www.ncbi.nlm.nih.gov/pubmed/31510704 http://dx.doi.org/10.1093/bioinformatics/btz338 |
_version_ | 1783432957532307456 |
---|---|
author | Handl, Lisa Jalali, Adrin Scherer, Michael Eggeling, Ralf Pfeifer, Nico |
author_facet | Handl, Lisa Jalali, Adrin Scherer, Michael Eggeling, Ralf Pfeifer, Nico |
author_sort | Handl, Lisa |
collection | PubMed |
description | MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. RESULTS: We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/PfeiferLabTue/wenda. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6612879 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-66128792019-07-12 Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data Handl, Lisa Jalali, Adrin Scherer, Michael Eggeling, Ralf Pfeifer, Nico Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. RESULTS: We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/PfeiferLabTue/wenda. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612879/ /pubmed/31510704 http://dx.doi.org/10.1093/bioinformatics/btz338 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2019 Conference Proceedings Handl, Lisa Jalali, Adrin Scherer, Michael Eggeling, Ralf Pfeifer, Nico Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title | Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title_full | Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title_fullStr | Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title_full_unstemmed | Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title_short | Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data |
title_sort | weighted elastic net for unsupervised domain adaptation with application to age prediction from dna methylation data |
topic | Ismb/Eccb 2019 Conference Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612879/ https://www.ncbi.nlm.nih.gov/pubmed/31510704 http://dx.doi.org/10.1093/bioinformatics/btz338 |
work_keys_str_mv | AT handllisa weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata AT jalaliadrin weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata AT scherermichael weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata AT eggelingralf weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata AT pfeifernico weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata |