Cargando…

Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme

BACKGROUND: The prediction of solvent accessibility could provide valuable clues for analyzing protein structure and functions, such as protein 3-Dimensional structure and B-cell epitope prediction. To fully decipher the protein-protein interaction process, an initial but crucial step is to calculat...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jian, Chen, Wenhan, Sun, Pingping, Zhao, Xiaowei, Ma, Zhiqiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608127/
https://www.ncbi.nlm.nih.gov/pubmed/26478747
http://dx.doi.org/10.1186/s13040-014-0031-3
_version_ 1782395614373347328
author Zhang, Jian
Chen, Wenhan
Sun, Pingping
Zhao, Xiaowei
Ma, Zhiqiang
author_facet Zhang, Jian
Chen, Wenhan
Sun, Pingping
Zhao, Xiaowei
Ma, Zhiqiang
author_sort Zhang, Jian
collection PubMed
description BACKGROUND: The prediction of solvent accessibility could provide valuable clues for analyzing protein structure and functions, such as protein 3-Dimensional structure and B-cell epitope prediction. To fully decipher the protein-protein interaction process, an initial but crucial step is to calculate the protein solvent accessibility, especially when the tertiary structure of the protein is unknown. Although some efforts have been put into the protein solvent accessibility prediction, the performance of existing methods is far from satisfaction. METHODS: In order to develop the high-accuracy model, we focus on some possible aspects concerning the prediction performance, including several sequence-derived features, a weighted sliding window scheme and the parameters optimization of machine learning approach. To address above issues, we take following strategies. Firstly, we explore various features which have been observed to be associated with the residue solvent accessibility. These discriminative features include protein evolutionary information, predicted protein secondary structure, native disorder, physicochemical propensities and several sequence-based structural descriptors of residues. Secondly, the different contributions of adjacent residues in sliding window are observed, thus a weighted sliding window scheme is proposed to differentiate the contributions of adjacent residues on the central residue. Thirdly, particle swarm optimization (PSO) is employed to search the global best parameters for the proposed predictor. RESULTS: Evaluated by 3-fold cross-validation, our method achieves the mean absolute error (MAE) of 14.1% and the person correlation coefficient (PCC) of 0.75 for our new-compiled dataset. When compared with the state-of-the-art prediction models in the two benchmark datasets, our method demonstrates better performance. Experimental results demonstrate that our PSAP achieves high performances and outperforms many existing predictors. A web server called PSAP is built and freely available at http://59.73.198.144:8088/SolventAccessibility/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-014-0031-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4608127
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46081272015-10-17 Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme Zhang, Jian Chen, Wenhan Sun, Pingping Zhao, Xiaowei Ma, Zhiqiang BioData Min Research BACKGROUND: The prediction of solvent accessibility could provide valuable clues for analyzing protein structure and functions, such as protein 3-Dimensional structure and B-cell epitope prediction. To fully decipher the protein-protein interaction process, an initial but crucial step is to calculate the protein solvent accessibility, especially when the tertiary structure of the protein is unknown. Although some efforts have been put into the protein solvent accessibility prediction, the performance of existing methods is far from satisfaction. METHODS: In order to develop the high-accuracy model, we focus on some possible aspects concerning the prediction performance, including several sequence-derived features, a weighted sliding window scheme and the parameters optimization of machine learning approach. To address above issues, we take following strategies. Firstly, we explore various features which have been observed to be associated with the residue solvent accessibility. These discriminative features include protein evolutionary information, predicted protein secondary structure, native disorder, physicochemical propensities and several sequence-based structural descriptors of residues. Secondly, the different contributions of adjacent residues in sliding window are observed, thus a weighted sliding window scheme is proposed to differentiate the contributions of adjacent residues on the central residue. Thirdly, particle swarm optimization (PSO) is employed to search the global best parameters for the proposed predictor. RESULTS: Evaluated by 3-fold cross-validation, our method achieves the mean absolute error (MAE) of 14.1% and the person correlation coefficient (PCC) of 0.75 for our new-compiled dataset. When compared with the state-of-the-art prediction models in the two benchmark datasets, our method demonstrates better performance. Experimental results demonstrate that our PSAP achieves high performances and outperforms many existing predictors. A web server called PSAP is built and freely available at http://59.73.198.144:8088/SolventAccessibility/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-014-0031-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-31 /pmc/articles/PMC4608127/ /pubmed/26478747 http://dx.doi.org/10.1186/s13040-014-0031-3 Text en © Zhang et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Jian
Chen, Wenhan
Sun, Pingping
Zhao, Xiaowei
Ma, Zhiqiang
Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title_full Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title_fullStr Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title_full_unstemmed Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title_short Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme
title_sort prediction of protein solvent accessibility using pso-svr with multiple sequence-derived features and weighted sliding window scheme
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608127/
https://www.ncbi.nlm.nih.gov/pubmed/26478747
http://dx.doi.org/10.1186/s13040-014-0031-3
work_keys_str_mv AT zhangjian predictionofproteinsolventaccessibilityusingpsosvrwithmultiplesequencederivedfeaturesandweightedslidingwindowscheme
AT chenwenhan predictionofproteinsolventaccessibilityusingpsosvrwithmultiplesequencederivedfeaturesandweightedslidingwindowscheme
AT sunpingping predictionofproteinsolventaccessibilityusingpsosvrwithmultiplesequencederivedfeaturesandweightedslidingwindowscheme
AT zhaoxiaowei predictionofproteinsolventaccessibilityusingpsosvrwithmultiplesequencederivedfeaturesandweightedslidingwindowscheme
AT mazhiqiang predictionofproteinsolventaccessibilityusingpsosvrwithmultiplesequencederivedfeaturesandweightedslidingwindowscheme