Cargando…

A fast SCOP fold classification system using content-based E-Predict algorithm

BACKGROUND: Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. Even though using the SCOP database is believed to be more reliable than classification results from other methods, it is labor intensive. To mimic hum...

Descripción completa

Detalles Bibliográficos
Autores principales: Chi, Pin-Hao, Shyu, Chi-Ren, Xu, Dong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1579235/
https://www.ncbi.nlm.nih.gov/pubmed/16872501
http://dx.doi.org/10.1186/1471-2105-7-362
_version_ 1782130316743278592
author Chi, Pin-Hao
Shyu, Chi-Ren
Xu, Dong
author_facet Chi, Pin-Hao
Shyu, Chi-Ren
Xu, Dong
author_sort Chi, Pin-Hao
collection PubMed
description BACKGROUND: Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. Even though using the SCOP database is believed to be more reliable than classification results from other methods, it is labor intensive. To mimic human classification processes, we develop an automatic SCOP fold classification system to assign possible known SCOP folds and recognize novel folds for newly-discovered proteins. RESULTS: With a sufficient amount of ground truth data, our system is able to assign the known folds for newly-discovered proteins in the latest SCOP v1.69 release with 92.17% accuracy. Our system also recognizes the novel folds with 89.27% accuracy using 10 fold cross validation. The average response time for proteins with 500 and 1409 amino acids to complete the classification process is 4.1 and 17.4 seconds, respectively. By comparison with several structural alignment algorithms, our approach outperforms previous methods on both the classification accuracy and efficiency. CONCLUSION: In this paper, we build an advanced, non-parametric classifier to accelerate the manual classification processes of SCOP. With satisfactory ground truth data from the SCOP database, our approach identifies relevant domain knowledge and yields reasonably accurate classifications. Our system is publicly accessible at .
format Text
id pubmed-1579235
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15792352006-10-02 A fast SCOP fold classification system using content-based E-Predict algorithm Chi, Pin-Hao Shyu, Chi-Ren Xu, Dong BMC Bioinformatics Software BACKGROUND: Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. Even though using the SCOP database is believed to be more reliable than classification results from other methods, it is labor intensive. To mimic human classification processes, we develop an automatic SCOP fold classification system to assign possible known SCOP folds and recognize novel folds for newly-discovered proteins. RESULTS: With a sufficient amount of ground truth data, our system is able to assign the known folds for newly-discovered proteins in the latest SCOP v1.69 release with 92.17% accuracy. Our system also recognizes the novel folds with 89.27% accuracy using 10 fold cross validation. The average response time for proteins with 500 and 1409 amino acids to complete the classification process is 4.1 and 17.4 seconds, respectively. By comparison with several structural alignment algorithms, our approach outperforms previous methods on both the classification accuracy and efficiency. CONCLUSION: In this paper, we build an advanced, non-parametric classifier to accelerate the manual classification processes of SCOP. With satisfactory ground truth data from the SCOP database, our approach identifies relevant domain knowledge and yields reasonably accurate classifications. Our system is publicly accessible at . BioMed Central 2006-07-26 /pmc/articles/PMC1579235/ /pubmed/16872501 http://dx.doi.org/10.1186/1471-2105-7-362 Text en Copyright © 2006 Chi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Chi, Pin-Hao
Shyu, Chi-Ren
Xu, Dong
A fast SCOP fold classification system using content-based E-Predict algorithm
title A fast SCOP fold classification system using content-based E-Predict algorithm
title_full A fast SCOP fold classification system using content-based E-Predict algorithm
title_fullStr A fast SCOP fold classification system using content-based E-Predict algorithm
title_full_unstemmed A fast SCOP fold classification system using content-based E-Predict algorithm
title_short A fast SCOP fold classification system using content-based E-Predict algorithm
title_sort fast scop fold classification system using content-based e-predict algorithm
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1579235/
https://www.ncbi.nlm.nih.gov/pubmed/16872501
http://dx.doi.org/10.1186/1471-2105-7-362
work_keys_str_mv AT chipinhao afastscopfoldclassificationsystemusingcontentbasedepredictalgorithm
AT shyuchiren afastscopfoldclassificationsystemusingcontentbasedepredictalgorithm
AT xudong afastscopfoldclassificationsystemusingcontentbasedepredictalgorithm
AT chipinhao fastscopfoldclassificationsystemusingcontentbasedepredictalgorithm
AT shyuchiren fastscopfoldclassificationsystemusingcontentbasedepredictalgorithm
AT xudong fastscopfoldclassificationsystemusingcontentbasedepredictalgorithm