Cargando…
NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic M...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371618/ https://www.ncbi.nlm.nih.gov/pubmed/30744685 http://dx.doi.org/10.1186/s13059-019-1634-2 |
_version_ | 1783394593033682944 |
---|---|
author | Caron, Barthélémy Luo, Yufei Rausell, Antonio |
author_facet | Caron, Barthélémy Luo, Yufei Rausell, Antonio |
author_sort | Caron, Barthélémy |
collection | PubMed |
description | State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1634-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6371618 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63716182019-02-25 NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans Caron, Barthélémy Luo, Yufei Rausell, Antonio Genome Biol Method State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1634-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-11 /pmc/articles/PMC6371618/ /pubmed/30744685 http://dx.doi.org/10.1186/s13059-019-1634-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Method Caron, Barthélémy Luo, Yufei Rausell, Antonio NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title | NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title_full | NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title_fullStr | NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title_full_unstemmed | NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title_short | NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans |
title_sort | ncboost classifies pathogenic non-coding variants in mendelian diseases through supervised learning on purifying selection signals in humans |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371618/ https://www.ncbi.nlm.nih.gov/pubmed/30744685 http://dx.doi.org/10.1186/s13059-019-1634-2 |
work_keys_str_mv | AT caronbarthelemy ncboostclassifiespathogenicnoncodingvariantsinmendeliandiseasesthroughsupervisedlearningonpurifyingselectionsignalsinhumans AT luoyufei ncboostclassifiespathogenicnoncodingvariantsinmendeliandiseasesthroughsupervisedlearningonpurifyingselectionsignalsinhumans AT rausellantonio ncboostclassifiespathogenicnoncodingvariantsinmendeliandiseasesthroughsupervisedlearningonpurifyingselectionsignalsinhumans |