Cargando…

Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

BACKGROUND: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C(β )...

Descripción completa

Detalles Bibliográficos
Autor principal:	Yuan, Zheng
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1277819/ https://www.ncbi.nlm.nih.gov/pubmed/16221309 http://dx.doi.org/10.1186/1471-2105-6-248

_version_	1782126034551832576
author	Yuan, Zheng
author_facet	Yuan, Zheng
author_sort	Yuan, Zheng
collection	PubMed
description	BACKGROUND: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C(β )atoms in other residues within a sphere around the C(β )atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. RESULTS: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. CONCLUSION: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.
format	Text
id	pubmed-1277819
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-12778192006-11-29 Better prediction of protein contact number using a support vector regression analysis of amino acid sequence Yuan, Zheng BMC Bioinformatics Research Article BACKGROUND: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C(β )atoms in other residues within a sphere around the C(β )atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. RESULTS: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. CONCLUSION: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties. BioMed Central 2005-10-13 /pmc/articles/PMC1277819/ /pubmed/16221309 http://dx.doi.org/10.1186/1471-2105-6-248 Text en Copyright © 2005 Yuan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Yuan, Zheng Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title	Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title_full	Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title_fullStr	Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title_full_unstemmed	Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title_short	Better prediction of protein contact number using a support vector regression analysis of amino acid sequence
title_sort	better prediction of protein contact number using a support vector regression analysis of amino acid sequence
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1277819/ https://www.ncbi.nlm.nih.gov/pubmed/16221309 http://dx.doi.org/10.1186/1471-2105-6-248
work_keys_str_mv	AT yuanzheng betterpredictionofproteincontactnumberusingasupportvectorregressionanalysisofaminoacidsequence

Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

Ejemplares similares