Cargando…

Protein structural class prediction based on an improved statistical strategy

BACKGROUND: A protein structural class (PSC) belongs to the most basic but important classification in protein structures. The prediction technique of protein structural class has been developing for decades. Two popular indices are the amino-acid-frequency (AAF) based, and amino-acid-arrangement (A...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Fei, Chen, Hang, Ni, Jun
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423446/
https://www.ncbi.nlm.nih.gov/pubmed/18541058
http://dx.doi.org/10.1186/1471-2105-9-S6-S5
_version_ 1782156101560565760
author Gu, Fei
Chen, Hang
Ni, Jun
author_facet Gu, Fei
Chen, Hang
Ni, Jun
author_sort Gu, Fei
collection PubMed
description BACKGROUND: A protein structural class (PSC) belongs to the most basic but important classification in protein structures. The prediction technique of protein structural class has been developing for decades. Two popular indices are the amino-acid-frequency (AAF) based, and amino-acid-arrangement (AAA) with long-term correlation (LTC) – based indices. They were proposed in many works. Both indices have its pros and cons. For example, the AAF index focuses on a statistical analysis, while the AAA-LTC emphasizes the long-term, biological significance. Unfortunately, the datasets used in previous work were not very reliable for a small number of sequences with a high-sequence similarity. RESULTS: By modifying a statistical strategy, we proposed a new index method that combines probability and information theory together with a long-term correlation. We also proposed a numerically and biologically reliable dataset included more than 5700 sequences with a low sequence similarity. The results showed that the proposed approach has its high accuracy. Comparing with amino acid composition (AAC) index using a distance method, the accuracy of our approach has a 16–20% improvement for re-substitution test and about 6–11% improvement for cross-validation test. The values were about 23% and 15% for the component coupled method (CCM). CONCLUSION: A new index method, combining probability and information theory together with a long-term correlation was proposed in this paper. The statistical method was improved significantly based on our new index. The cross validation test was conducted, and the result show the proposed method has a great improvement.
format Text
id pubmed-2423446
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24234462008-06-11 Protein structural class prediction based on an improved statistical strategy Gu, Fei Chen, Hang Ni, Jun BMC Bioinformatics Research BACKGROUND: A protein structural class (PSC) belongs to the most basic but important classification in protein structures. The prediction technique of protein structural class has been developing for decades. Two popular indices are the amino-acid-frequency (AAF) based, and amino-acid-arrangement (AAA) with long-term correlation (LTC) – based indices. They were proposed in many works. Both indices have its pros and cons. For example, the AAF index focuses on a statistical analysis, while the AAA-LTC emphasizes the long-term, biological significance. Unfortunately, the datasets used in previous work were not very reliable for a small number of sequences with a high-sequence similarity. RESULTS: By modifying a statistical strategy, we proposed a new index method that combines probability and information theory together with a long-term correlation. We also proposed a numerically and biologically reliable dataset included more than 5700 sequences with a low sequence similarity. The results showed that the proposed approach has its high accuracy. Comparing with amino acid composition (AAC) index using a distance method, the accuracy of our approach has a 16–20% improvement for re-substitution test and about 6–11% improvement for cross-validation test. The values were about 23% and 15% for the component coupled method (CCM). CONCLUSION: A new index method, combining probability and information theory together with a long-term correlation was proposed in this paper. The statistical method was improved significantly based on our new index. The cross validation test was conducted, and the result show the proposed method has a great improvement. BioMed Central 2008-05-28 /pmc/articles/PMC2423446/ /pubmed/18541058 http://dx.doi.org/10.1186/1471-2105-9-S6-S5 Text en Copyright © 2008 Gu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Gu, Fei
Chen, Hang
Ni, Jun
Protein structural class prediction based on an improved statistical strategy
title Protein structural class prediction based on an improved statistical strategy
title_full Protein structural class prediction based on an improved statistical strategy
title_fullStr Protein structural class prediction based on an improved statistical strategy
title_full_unstemmed Protein structural class prediction based on an improved statistical strategy
title_short Protein structural class prediction based on an improved statistical strategy
title_sort protein structural class prediction based on an improved statistical strategy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423446/
https://www.ncbi.nlm.nih.gov/pubmed/18541058
http://dx.doi.org/10.1186/1471-2105-9-S6-S5
work_keys_str_mv AT gufei proteinstructuralclasspredictionbasedonanimprovedstatisticalstrategy
AT chenhang proteinstructuralclasspredictionbasedonanimprovedstatisticalstrategy
AT nijun proteinstructuralclasspredictionbasedonanimprovedstatisticalstrategy