Cargando…

Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models

Accurate genome annotations in databases are a critical resource available to the scientific community for analysis and research. Inaccurate and inconsistent annotations exist as a result of errors generated from mass automated annotation, and currently act as a barrier to the application of bioinfo...

Descripción completa

Detalles Bibliográficos
Autores principales: Binter, Erik, Binter, Scott, Disz, Terry, Kalmanek, Elizabeth, Powers, Alexander, Pusch, Gordon D., Turgeon, Julie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376863/
http://dx.doi.org/10.1007/s13205-011-0039-z
_version_ 1782235885318701056
author Binter, Erik
Binter, Scott
Disz, Terry
Kalmanek, Elizabeth
Powers, Alexander
Pusch, Gordon D.
Turgeon, Julie
author_facet Binter, Erik
Binter, Scott
Disz, Terry
Kalmanek, Elizabeth
Powers, Alexander
Pusch, Gordon D.
Turgeon, Julie
author_sort Binter, Erik
collection PubMed
description Accurate genome annotations in databases are a critical resource available to the scientific community for analysis and research. Inaccurate and inconsistent annotations exist as a result of errors generated from mass automated annotation, and currently act as a barrier to the application of bioinformatics. The purpose of this effort was to improve the SEED by improving the connection of functional roles to literature references. Direct literature references (DLits), found through searches of PubMed and other online databases such as SwissProt, were attached to protein sequences within the PubSEED to provide literature support for the roughly 2,500 distinct functional roles used to construct metabolic models within the Model SEED. Only DLits in which a researcher asserted the function of a protein were attached to sequences. Starting from a list of 1,072 functional roles that did not previously have DLit support, we were able to connect sequences to literature for 655 functional roles, at least 484 of which were in the original list of unsupported roles. When added to the existing set of sequences having DLits, the resulting set of DLit-sequence pairs (the foundation set) now connects approximately 4,300 DLits to approximately 5,600 distinct protein sequences obtained from approximately 16,000 genes (some of these genes have identical protein sequences). From the foundation set, we construct projection sets such that each set contains one member of the foundation set and projections of its functional role onto similar genes. The projection sets revealed 120 inconsistent annotations within the SEED. Two types of inconsistencies were corrected through manual annotation in the PubSEED: instances in which two identical protein sequences had been annotated with different functions, and instances when projected functions contradicted previous annotations. 26,785 changes to gene function assignment, 219 of which were to previously uncharacterized proteins, resulted in a more consistent and accurate set of input data from which to construct revised metabolic models within the Model SEED. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13205-011-0039-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-3376863
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-33768632012-09-11 Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models Binter, Erik Binter, Scott Disz, Terry Kalmanek, Elizabeth Powers, Alexander Pusch, Gordon D. Turgeon, Julie 3 Biotech Original Article Accurate genome annotations in databases are a critical resource available to the scientific community for analysis and research. Inaccurate and inconsistent annotations exist as a result of errors generated from mass automated annotation, and currently act as a barrier to the application of bioinformatics. The purpose of this effort was to improve the SEED by improving the connection of functional roles to literature references. Direct literature references (DLits), found through searches of PubMed and other online databases such as SwissProt, were attached to protein sequences within the PubSEED to provide literature support for the roughly 2,500 distinct functional roles used to construct metabolic models within the Model SEED. Only DLits in which a researcher asserted the function of a protein were attached to sequences. Starting from a list of 1,072 functional roles that did not previously have DLit support, we were able to connect sequences to literature for 655 functional roles, at least 484 of which were in the original list of unsupported roles. When added to the existing set of sequences having DLits, the resulting set of DLit-sequence pairs (the foundation set) now connects approximately 4,300 DLits to approximately 5,600 distinct protein sequences obtained from approximately 16,000 genes (some of these genes have identical protein sequences). From the foundation set, we construct projection sets such that each set contains one member of the foundation set and projections of its functional role onto similar genes. The projection sets revealed 120 inconsistent annotations within the SEED. Two types of inconsistencies were corrected through manual annotation in the PubSEED: instances in which two identical protein sequences had been annotated with different functions, and instances when projected functions contradicted previous annotations. 26,785 changes to gene function assignment, 219 of which were to previously uncharacterized proteins, resulted in a more consistent and accurate set of input data from which to construct revised metabolic models within the Model SEED. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13205-011-0039-z) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2011-12-14 2012-06 /pmc/articles/PMC3376863/ http://dx.doi.org/10.1007/s13205-011-0039-z Text en © The Author(s) 2011 https://creativecommons.org/licenses/by/4.0/ This article is published under license to BioMed Central Ltd. Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Original Article
Binter, Erik
Binter, Scott
Disz, Terry
Kalmanek, Elizabeth
Powers, Alexander
Pusch, Gordon D.
Turgeon, Julie
Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title_full Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title_fullStr Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title_full_unstemmed Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title_short Grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
title_sort grounding annotations in published literature with an emphasis on the functional roles used in metabolic models
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3376863/
http://dx.doi.org/10.1007/s13205-011-0039-z
work_keys_str_mv AT bintererik groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT binterscott groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT diszterry groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT kalmanekelizabeth groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT powersalexander groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT puschgordond groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels
AT turgeonjulie groundingannotationsinpublishedliteraturewithanemphasisonthefunctionalrolesusedinmetabolicmodels