Cargando…

Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data

MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical...

Descripción completa

Detalles Bibliográficos
Autores principales: Handl, Lisa, Jalali, Adrin, Scherer, Michael, Eggeling, Ralf, Pfeifer, Nico
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612879/
https://www.ncbi.nlm.nih.gov/pubmed/31510704
http://dx.doi.org/10.1093/bioinformatics/btz338
_version_ 1783432957532307456
author Handl, Lisa
Jalali, Adrin
Scherer, Michael
Eggeling, Ralf
Pfeifer, Nico
author_facet Handl, Lisa
Jalali, Adrin
Scherer, Michael
Eggeling, Ralf
Pfeifer, Nico
author_sort Handl, Lisa
collection PubMed
description MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. RESULTS: We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/PfeiferLabTue/wenda. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612879
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128792019-07-12 Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data Handl, Lisa Jalali, Adrin Scherer, Michael Eggeling, Ralf Pfeifer, Nico Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. RESULTS: We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/PfeiferLabTue/wenda. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612879/ /pubmed/31510704 http://dx.doi.org/10.1093/bioinformatics/btz338 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Handl, Lisa
Jalali, Adrin
Scherer, Michael
Eggeling, Ralf
Pfeifer, Nico
Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title_full Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title_fullStr Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title_full_unstemmed Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title_short Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data
title_sort weighted elastic net for unsupervised domain adaptation with application to age prediction from dna methylation data
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612879/
https://www.ncbi.nlm.nih.gov/pubmed/31510704
http://dx.doi.org/10.1093/bioinformatics/btz338
work_keys_str_mv AT handllisa weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata
AT jalaliadrin weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata
AT scherermichael weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata
AT eggelingralf weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata
AT pfeifernico weightedelasticnetforunsuperviseddomainadaptationwithapplicationtoagepredictionfromdnamethylationdata