Cargando…

Sequence based residue depth prediction using evolutionary information and predicted secondary structure

BACKGROUND: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional si...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Hua, Zhang, Tuo, Chen, Ke, Shen, Shiyi, Ruan, Jishou, Kurgan, Lukasz
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2567998/
https://www.ncbi.nlm.nih.gov/pubmed/18803867
http://dx.doi.org/10.1186/1471-2105-9-388
_version_ 1782160026172915712
author Zhang, Hua
Zhang, Tuo
Chen, Ke
Shen, Shiyi
Ruan, Jishou
Kurgan, Lukasz
author_facet Zhang, Hua
Zhang, Tuo
Chen, Ke
Shen, Shiyi
Ruan, Jishou
Kurgan, Lukasz
author_sort Zhang, Hua
collection PubMed
description BACKGROUND: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design. RESULTS: A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509–516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles. CONCLUSION: The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.
format Text
id pubmed-2567998
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25679982008-10-16 Sequence based residue depth prediction using evolutionary information and predicted secondary structure Zhang, Hua Zhang, Tuo Chen, Ke Shen, Shiyi Ruan, Jishou Kurgan, Lukasz BMC Bioinformatics Methodology Article BACKGROUND: Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design. RESULTS: A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509–516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles. CONCLUSION: The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations. BioMed Central 2008-09-20 /pmc/articles/PMC2567998/ /pubmed/18803867 http://dx.doi.org/10.1186/1471-2105-9-388 Text en Copyright © 2008 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Zhang, Hua
Zhang, Tuo
Chen, Ke
Shen, Shiyi
Ruan, Jishou
Kurgan, Lukasz
Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title_full Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title_fullStr Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title_full_unstemmed Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title_short Sequence based residue depth prediction using evolutionary information and predicted secondary structure
title_sort sequence based residue depth prediction using evolutionary information and predicted secondary structure
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2567998/
https://www.ncbi.nlm.nih.gov/pubmed/18803867
http://dx.doi.org/10.1186/1471-2105-9-388
work_keys_str_mv AT zhanghua sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure
AT zhangtuo sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure
AT chenke sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure
AT shenshiyi sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure
AT ruanjishou sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure
AT kurganlukasz sequencebasedresiduedepthpredictionusingevolutionaryinformationandpredictedsecondarystructure