Cargando…

Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chiu, Shih-Hau, Chen, Chien-Chi, Yuan, Gwo-Fang, Lin, Thy-Hou
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/ https://www.ncbi.nlm.nih.gov/pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304

_version_	1782129344544505856
author	Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou
author_facet	Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou
author_sort	Chiu, Shih-Hau
collection	PubMed
description	BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.
format	Text
id	pubmed-1552092
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15520922006-08-24 Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou BMC Bioinformatics Research Article BACKGROUND: The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. RESULTS: There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. CONCLUSION: The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. BioMed Central 2006-06-15 /pmc/articles/PMC1552092/ /pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304 Text en Copyright © 2006 Chiu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Chiu, Shih-Hau Chen, Chien-Chi Yuan, Gwo-Fang Lin, Thy-Hou Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title	Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_full	Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_fullStr	Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_full_unstemmed	Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_short	Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
title_sort	association algorithm to mine the rules that govern enzyme definition and to classify protein sequences
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1552092/ https://www.ncbi.nlm.nih.gov/pubmed/16776838 http://dx.doi.org/10.1186/1471-2105-7-304
work_keys_str_mv	AT chiushihhau associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT chenchienchi associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT yuangwofang associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences AT linthyhou associationalgorithmtominetherulesthatgovernenzymedefinitionandtoclassifyproteinsequences

Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

Ejemplares similares