Cargando…

Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiu, Shih-Hau, Chen, Chien-Chi, Yuan, Gwo-Fang, Lin, Thy-Hou
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/
https://www.ncbi.nlm.nih.gov/pubmed/16776838
http://dx.doi.org/10.1186/1471-2105-7-304
_version_ 1782129344544505856
author Chiu, Shih-Hau
Chen, Chien-Chi
Yuan, Gwo-Fang
Lin, Thy-Hou
author_facet Chiu, Shih-Hau
Chen, Chien-Chi
Yuan, Gwo-Fang
Lin, Thy-Hou
author_sort Chiu, Shih-Hau
collection PubMed
description BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.
format Text
id pubmed-1552092
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15520922006-08-24 Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou BMC Bioinformatics Research Article BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. BioMed Central 2006-06-15 /pmc/articles/PMC1552092/ /pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304 Text en Copyright © 2006 Chiu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chiu, Shih-Hau
Chen, Chien-Chi
Yuan, Gwo-Fang
Lin, Thy-Hou
Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_full Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_fullStr Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_full_unstemmed Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_short Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_sort association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/
https://www.ncbi.nlm.nih.gov/pubmed/16776838
http://dx.doi.org/10.1186/1471-2105-7-304
work_keys_str_mv AT chiushihhau associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences
AT chenchienchi associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences
AT yuangwofang associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences
AT linthyhou associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences