Cargando…
PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895273/ https://www.ncbi.nlm.nih.gov/pubmed/26818760 http://dx.doi.org/10.1186/s12859-015-0851-2 |
_version_ | 1782435815972929536 |
---|---|
author | Fan, Chao Liu, Diwei Huang, Rui Chen, Zhigang Deng, Lei |
author_facet | Fan, Chao Liu, Diwei Huang, Rui Chen, Zhigang Deng, Lei |
author_sort | Fan, Chao |
collection | PubMed |
description | BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. RESULTS: Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. CONCLUSIONS: Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints. |
format | Online Article Text |
id | pubmed-4895273 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48952732016-06-10 PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility Fan, Chao Liu, Diwei Huang, Rui Chen, Zhigang Deng, Lei BMC Bioinformatics Proceedings BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. RESULTS: Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. CONCLUSIONS: Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints. BioMed Central 2016-01-11 /pmc/articles/PMC4895273/ /pubmed/26818760 http://dx.doi.org/10.1186/s12859-015-0851-2 Text en © Fan et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Fan, Chao Liu, Diwei Huang, Rui Chen, Zhigang Deng, Lei PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title | PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title_full | PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title_fullStr | PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title_full_unstemmed | PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title_short | PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility |
title_sort | predrsa: a gradient boosted regression trees approach for predicting protein solvent accessibility |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895273/ https://www.ncbi.nlm.nih.gov/pubmed/26818760 http://dx.doi.org/10.1186/s12859-015-0851-2 |
work_keys_str_mv | AT fanchao predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility AT liudiwei predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility AT huangrui predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility AT chenzhigang predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility AT denglei predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility |