Cargando…

Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers

We report on a new type of systematic annotation error in genome and pathway databases that results from the misinterpretation of partial Enzyme Commission (EC) numbers such as ‘1.1.1.-’. This error results in the assignment of genes annotated with a partial EC number to many or all biochemical reac...

Descripción completa

Detalles Bibliográficos
Autores principales: Green, M. L., Karp, P. D.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1179732/
https://www.ncbi.nlm.nih.gov/pubmed/16034025
http://dx.doi.org/10.1093/nar/gki711
_version_ 1782124590931116032
author Green, M. L.
Karp, P. D.
author_facet Green, M. L.
Karp, P. D.
author_sort Green, M. L.
collection PubMed
description We report on a new type of systematic annotation error in genome and pathway databases that results from the misinterpretation of partial Enzyme Commission (EC) numbers such as ‘1.1.1.-’. This error results in the assignment of genes annotated with a partial EC number to many or all biochemical reactions that are annotated with the same partial EC number. That inference is faulty because of the ambiguous nature of partial EC numbers. We have observed this type of error in multiple databases, including KEGG, VIMSS and IMG, all of which assign genes to KEGG pathways. The Escherichia coli subset of the KEGG database exhibits this error for 6.8% of its gene-reaction assignments. For example, KEGG contains 17 reactions that are annotated with EC 1.1.1.-. A group of three E.coli genes, b1580 [putative dehydrogenase, NAD(P)-binding, starvation-sensing protein], b3787 (UDP-N-acetyl-d-mannosaminuronic acid dehydrogenase) and b0207 (2,5-diketo-d-gluconate reductase B), is assigned to 15 of those reactions, despite experimental evidence indicating different single functions for two of the three genes. Furthermore, the databases (DBs) are internally inconsistent in that the description of gene functions for genes with partial EC numbers is inconsistent with the activities implied by reactions to which the genes were assigned. We infer that these inconsistencies result from the processing used to match gene products to reactions within KEGG's metabolic pathways. These errors affect scientists who use these DBs as online encyclopedias and they affect bioinformaticists who use these DBs to train and validate newly developed algorithms.
format Text
id pubmed-1179732
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-11797322005-07-22 Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers Green, M. L. Karp, P. D. Nucleic Acids Res Article We report on a new type of systematic annotation error in genome and pathway databases that results from the misinterpretation of partial Enzyme Commission (EC) numbers such as ‘1.1.1.-’. This error results in the assignment of genes annotated with a partial EC number to many or all biochemical reactions that are annotated with the same partial EC number. That inference is faulty because of the ambiguous nature of partial EC numbers. We have observed this type of error in multiple databases, including KEGG, VIMSS and IMG, all of which assign genes to KEGG pathways. The Escherichia coli subset of the KEGG database exhibits this error for 6.8% of its gene-reaction assignments. For example, KEGG contains 17 reactions that are annotated with EC 1.1.1.-. A group of three E.coli genes, b1580 [putative dehydrogenase, NAD(P)-binding, starvation-sensing protein], b3787 (UDP-N-acetyl-d-mannosaminuronic acid dehydrogenase) and b0207 (2,5-diketo-d-gluconate reductase B), is assigned to 15 of those reactions, despite experimental evidence indicating different single functions for two of the three genes. Furthermore, the databases (DBs) are internally inconsistent in that the description of gene functions for genes with partial EC numbers is inconsistent with the activities implied by reactions to which the genes were assigned. We infer that these inconsistencies result from the processing used to match gene products to reactions within KEGG's metabolic pathways. These errors affect scientists who use these DBs as online encyclopedias and they affect bioinformaticists who use these DBs to train and validate newly developed algorithms. Oxford University Press 2005 2005-07-20 /pmc/articles/PMC1179732/ /pubmed/16034025 http://dx.doi.org/10.1093/nar/gki711 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Green, M. L.
Karp, P. D.
Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title_full Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title_fullStr Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title_full_unstemmed Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title_short Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
title_sort genome annotation errors in pathway databases due to semantic ambiguity in partial ec numbers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1179732/
https://www.ncbi.nlm.nih.gov/pubmed/16034025
http://dx.doi.org/10.1093/nar/gki711
work_keys_str_mv AT greenml genomeannotationerrorsinpathwaydatabasesduetosemanticambiguityinpartialecnumbers
AT karppd genomeannotationerrorsinpathwaydatabasesduetosemanticambiguityinpartialecnumbers