Cargando…
Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants
Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5462751/ https://www.ncbi.nlm.nih.gov/pubmed/28592878 http://dx.doi.org/10.1038/s41598-017-03011-5 |
_version_ | 1783242564292313088 |
---|---|
author | Schubach, Max Re, Matteo Robinson, Peter N. Valentini, Giorgio |
author_facet | Schubach, Max Re, Matteo Robinson, Peter N. Valentini, Giorgio |
author_sort | Schubach, Max |
collection | PubMed |
description | Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regions of human genome. Machine Learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken and egg problem - such variants cannot be easily found without ML, but ML cannot begin to be effective until a sufficient number of instances have been found. Most of state-of-the-art ML-based methods do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, thus resulting in a significant reduction of sensitivity and precision. We present a novel method that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach that outperforms state-of-the-art methods in two different contexts: the prediction of non-coding variants associated with Mendelian and with complex diseases. We show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms and we provide a method and an easy-to-use software tool that can be effectively applied to this challenging prediction task. |
format | Online Article Text |
id | pubmed-5462751 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-54627512017-06-08 Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants Schubach, Max Re, Matteo Robinson, Peter N. Valentini, Giorgio Sci Rep Article Disease and trait-associated variants represent a tiny minority of all known genetic variation, and therefore there is necessarily an imbalance between the small set of available disease-associated and the much larger set of non-deleterious genomic variation, especially in non-coding regulatory regions of human genome. Machine Learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken and egg problem - such variants cannot be easily found without ML, but ML cannot begin to be effective until a sufficient number of instances have been found. Most of state-of-the-art ML-based methods do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, thus resulting in a significant reduction of sensitivity and precision. We present a novel method that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach that outperforms state-of-the-art methods in two different contexts: the prediction of non-coding variants associated with Mendelian and with complex diseases. We show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms and we provide a method and an easy-to-use software tool that can be effectively applied to this challenging prediction task. Nature Publishing Group UK 2017-06-07 /pmc/articles/PMC5462751/ /pubmed/28592878 http://dx.doi.org/10.1038/s41598-017-03011-5 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Schubach, Max Re, Matteo Robinson, Peter N. Valentini, Giorgio Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title | Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title_full | Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title_fullStr | Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title_full_unstemmed | Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title_short | Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants |
title_sort | imbalance-aware machine learning for predicting rare and common disease-associated non-coding variants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5462751/ https://www.ncbi.nlm.nih.gov/pubmed/28592878 http://dx.doi.org/10.1038/s41598-017-03011-5 |
work_keys_str_mv | AT schubachmax imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT rematteo imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT robinsonpetern imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants AT valentinigiorgio imbalanceawaremachinelearningforpredictingrareandcommondiseaseassociatednoncodingvariants |