Cargando…
Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/ https://www.ncbi.nlm.nih.gov/pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304 |
_version_ | 1782129344544505856 |
---|---|
author | Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou |
author_facet | Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou |
author_sort | Chiu, Shih-Hau |
collection | PubMed |
description | BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. |
format | Text |
id | pubmed-1552092 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-15520922006-08-24 Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou BMC Bioinformatics Research Article BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. BioMed Central 2006-06-15 /pmc/articles/PMC1552092/ /pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304 Text en Copyright © 2006 Chiu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title | Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title_full | Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title_fullStr | Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title_full_unstemmed | Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title_short | Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
title_sort | association algorithm to mine the rules that govern enzyme definition and to classify protein sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/ https://www.ncbi.nlm.nih.gov/pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304 |
work_keys_str_mv | AT chiushihhau associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT chenchienchi associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT yuangwofang associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT linthyhou associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences |