Cargando…

PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility

BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Chao, Liu, Diwei, Huang, Rui, Chen, Zhigang, Deng, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895273/
https://www.ncbi.nlm.nih.gov/pubmed/26818760
http://dx.doi.org/10.1186/s12859-015-0851-2
_version_ 1782435815972929536
author Fan, Chao
Liu, Diwei
Huang, Rui
Chen, Zhigang
Deng, Lei
author_facet Fan, Chao
Liu, Diwei
Huang, Rui
Chen, Zhigang
Deng, Lei
author_sort Fan, Chao
collection PubMed
description BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. RESULTS: Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. CONCLUSIONS: Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints.
format Online
Article
Text
id pubmed-4895273
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48952732016-06-10 PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility Fan, Chao Liu, Diwei Huang, Rui Chen, Zhigang Deng, Lei BMC Bioinformatics Proceedings BACKGROUND: Protein solvent accessibility prediction is a pivotal intermediate step towards modeling protein tertiary structures directly from one-dimensional sequences. It also plays an important part in identifying protein folds and domains. Although some methods have been presented to the protein solvent accessibility prediction in recent years, the performance is far from satisfactory. In this work, we propose PredRSA, a computational method that can accurately predict relative solvent accessible surface area (RSA) of residues by exploring various local and global sequence features which have been observed to be associated with solvent accessibility. Based on these features, a novel and efficient approach, Gradient Boosted Regression Trees (GBRT), is first adopted to predict RSA. RESULTS: Experimental results obtained from 5-fold cross-validation based on the Manesh-215 dataset show that the mean absolute error (MAE) and the Pearson correlation coefficient (PCC) of PredRSA are 9.0 % and 0.75, respectively, which are better than that of the existing methods. Moreover, we evaluate the performance of PredRSA using an independent test set of 68 proteins. Compared with the state-of-the-art approaches (SPINE-X and ASAquick), PredRSA achieves a significant improvement on the prediction quality. CONCLUSIONS: Our experimental results show that the Gradient Boosted Regression Trees algorithm and the novel feature combination are quite effective in relative solvent accessibility prediction. The proposed PredRSA method could be useful in assisting the prediction of protein structures by applying the predicted RSA as useful restraints. BioMed Central 2016-01-11 /pmc/articles/PMC4895273/ /pubmed/26818760 http://dx.doi.org/10.1186/s12859-015-0851-2 Text en © Fan et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Fan, Chao
Liu, Diwei
Huang, Rui
Chen, Zhigang
Deng, Lei
PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title_full PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title_fullStr PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title_full_unstemmed PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title_short PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
title_sort predrsa: a gradient boosted regression trees approach for predicting protein solvent accessibility
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895273/
https://www.ncbi.nlm.nih.gov/pubmed/26818760
http://dx.doi.org/10.1186/s12859-015-0851-2
work_keys_str_mv AT fanchao predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility
AT liudiwei predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility
AT huangrui predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility
AT chenzhigang predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility
AT denglei predrsaagradientboostedregressiontreesapproachforpredictingproteinsolventaccessibility