Cargando…

Correction of the Caulobacter crescentus NA1000 Genome Annotation

Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To impr...

Descripción completa

Detalles Bibliográficos
Autores principales: Ely, Bert, Scott, LaTia Etheredge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3951458/
https://www.ncbi.nlm.nih.gov/pubmed/24621776
http://dx.doi.org/10.1371/journal.pone.0091668
_version_ 1782307124694482944
author Ely, Bert
Scott, LaTia Etheredge
author_facet Ely, Bert
Scott, LaTia Etheredge
author_sort Ely, Bert
collection PubMed
description Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.
format Online
Article
Text
id pubmed-3951458
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39514582014-03-13 Correction of the Caulobacter crescentus NA1000 Genome Annotation Ely, Bert Scott, LaTia Etheredge PLoS One Research Article Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%. Public Library of Science 2014-03-12 /pmc/articles/PMC3951458/ /pubmed/24621776 http://dx.doi.org/10.1371/journal.pone.0091668 Text en © 2014 Ely, Scott http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ely, Bert
Scott, LaTia Etheredge
Correction of the Caulobacter crescentus NA1000 Genome Annotation
title Correction of the Caulobacter crescentus NA1000 Genome Annotation
title_full Correction of the Caulobacter crescentus NA1000 Genome Annotation
title_fullStr Correction of the Caulobacter crescentus NA1000 Genome Annotation
title_full_unstemmed Correction of the Caulobacter crescentus NA1000 Genome Annotation
title_short Correction of the Caulobacter crescentus NA1000 Genome Annotation
title_sort correction of the caulobacter crescentus na1000 genome annotation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3951458/
https://www.ncbi.nlm.nih.gov/pubmed/24621776
http://dx.doi.org/10.1371/journal.pone.0091668
work_keys_str_mv AT elybert correctionofthecaulobactercrescentusna1000genomeannotation
AT scottlatiaetheredge correctionofthecaulobactercrescentusna1000genomeannotation