Cargando…
SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences
BACKGROUND: Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus dete...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2391167/ https://www.ncbi.nlm.nih.gov/pubmed/18452616 http://dx.doi.org/10.1186/1471-2105-9-226 |
_version_ | 1782155352878350336 |
---|---|
author | Kurgan, Lukasz Cios, Krzysztof Chen, Ke |
author_facet | Kurgan, Lukasz Cios, Krzysztof Chen, Ke |
author_sort | Kurgan, Lukasz |
collection | PubMed |
description | BACKGROUND: Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. RESULTS: SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. CONCLUSION: The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. |
format | Text |
id | pubmed-2391167 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23911672008-05-22 SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences Kurgan, Lukasz Cios, Krzysztof Chen, Ke BMC Bioinformatics Methodology Article BACKGROUND: Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. RESULTS: SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. CONCLUSION: The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. BioMed Central 2008-05-01 /pmc/articles/PMC2391167/ /pubmed/18452616 http://dx.doi.org/10.1186/1471-2105-9-226 Text en Copyright © 2008 Kurgan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Kurgan, Lukasz Cios, Krzysztof Chen, Ke SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title | SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title_full | SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title_fullStr | SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title_full_unstemmed | SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title_short | SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
title_sort | scpred: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2391167/ https://www.ncbi.nlm.nih.gov/pubmed/18452616 http://dx.doi.org/10.1186/1471-2105-9-226 |
work_keys_str_mv | AT kurganlukasz scpredaccuratepredictionofproteinstructuralclassforsequencesoftwilightzonesimilaritywithpredictingsequences AT cioskrzysztof scpredaccuratepredictionofproteinstructuralclassforsequencesoftwilightzonesimilaritywithpredictingsequences AT chenke scpredaccuratepredictionofproteinstructuralclassforsequencesoftwilightzonesimilaritywithpredictingsequences |