Cargando…

FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model

Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with ce...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Mingjun, Zhao, Xing-Ming, Takemoto, Kazuhiro, Xu, Haisong, Li, Yuan, Akutsu, Tatsuya, Song, Jiangning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3427247/
https://www.ncbi.nlm.nih.gov/pubmed/22937107
http://dx.doi.org/10.1371/journal.pone.0043847
_version_ 1782241589508177920
author Wang, Mingjun
Zhao, Xing-Ming
Takemoto, Kazuhiro
Xu, Haisong
Li, Yuan
Akutsu, Tatsuya
Song, Jiangning
author_facet Wang, Mingjun
Zhao, Xing-Ming
Takemoto, Kazuhiro
Xu, Haisong
Li, Yuan
Akutsu, Tatsuya
Song, Jiangning
author_sort Wang, Mingjun
collection PubMed
description Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.
format Online
Article
Text
id pubmed-3427247
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34272472012-08-30 FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model Wang, Mingjun Zhao, Xing-Ming Takemoto, Kazuhiro Xu, Haisong Li, Yuan Akutsu, Tatsuya Song, Jiangning PLoS One Research Article Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV. Public Library of Science 2012-08-24 /pmc/articles/PMC3427247/ /pubmed/22937107 http://dx.doi.org/10.1371/journal.pone.0043847 Text en © 2012 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Mingjun
Zhao, Xing-Ming
Takemoto, Kazuhiro
Xu, Haisong
Li, Yuan
Akutsu, Tatsuya
Song, Jiangning
FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title_full FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title_fullStr FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title_full_unstemmed FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title_short FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model
title_sort funsav: predicting the functional effect of single amino acid variants using a two-stage random forest model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3427247/
https://www.ncbi.nlm.nih.gov/pubmed/22937107
http://dx.doi.org/10.1371/journal.pone.0043847
work_keys_str_mv AT wangmingjun funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT zhaoxingming funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT takemotokazuhiro funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT xuhaisong funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT liyuan funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT akutsutatsuya funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel
AT songjiangning funsavpredictingthefunctionaleffectofsingleaminoacidvariantsusingatwostagerandomforestmodel