Cargando…

In search of genome annotation consistency: solid gene clusters and how to use them

Maintaining consistency in genome annotations is important for supporting many computational tasks, particularly metabolic modeling. The SEED project has implemented a process that improves annotation consistencies across microbial genomes for proteins with conserved sequences and genomic context. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, James J., Olsen, Gary J., Overbeek, Ross, Vonstein, Veronika, Xia, Fangfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4026451/
https://www.ncbi.nlm.nih.gov/pubmed/28324432
http://dx.doi.org/10.1007/s13205-013-0152-2
Descripción
Sumario:Maintaining consistency in genome annotations is important for supporting many computational tasks, particularly metabolic modeling. The SEED project has implemented a process that improves annotation consistencies across microbial genomes for proteins with conserved sequences and genomic context. In this research report, we describe this process and show how this effort has resulted in improvements to microbial genome annotations in the SEED. We also compare SEED annotation consistencies with other commonly used resources such as IMG (the Joint Genome Institute’s Integrated Microbial Genomes system), RefSeq (the National Center for Biotechnology Information’s Reference Sequence Database), Swiss-Prot (the annotated protein sequence database of the Swiss Institute of Bioinformatics, European Molecular Biology Laboratory and the European Bioinformatics Institute) and TrEMBL (Translated European Molecular Biology Laboratory nucleotide sequence data Library). Our analysis indicates that manual and computational efforts are paying off for the databases where consistency is a major goal.