Cargando…
Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties
Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associ...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470696/ https://www.ncbi.nlm.nih.gov/pubmed/28614374 http://dx.doi.org/10.1371/journal.pone.0179314 |
_version_ | 1783243810730409984 |
---|---|
author | Pan, Yuliang Liu, Diwei Deng, Lei |
author_facet | Pan, Yuliang Liu, Diwei Deng, Lei |
author_sort | Pan, Yuliang |
collection | PubMed |
description | Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance. |
format | Online Article Text |
id | pubmed-5470696 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-54706962017-07-03 Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties Pan, Yuliang Liu, Diwei Deng, Lei PLoS One Research Article Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance. Public Library of Science 2017-06-14 /pmc/articles/PMC5470696/ /pubmed/28614374 http://dx.doi.org/10.1371/journal.pone.0179314 Text en © 2017 Pan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Pan, Yuliang Liu, Diwei Deng, Lei Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title | Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title_full | Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title_fullStr | Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title_full_unstemmed | Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title_short | Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
title_sort | accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470696/ https://www.ncbi.nlm.nih.gov/pubmed/28614374 http://dx.doi.org/10.1371/journal.pone.0179314 |
work_keys_str_mv | AT panyuliang accuratepredictionoffunctionaleffectsforvariantsbycombininggradienttreeboostingwithoptimalneighborhoodproperties AT liudiwei accuratepredictionoffunctionaleffectsforvariantsbycombininggradienttreeboostingwithoptimalneighborhoodproperties AT denglei accuratepredictionoffunctionaleffectsforvariantsbycombininggradienttreeboostingwithoptimalneighborhoodproperties |