Cargando…

A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites

The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any partic...

Descripción completa

Detalles Bibliográficos
Autores principales: Overmars, Lex, Siezen, Roland J., Francke, Christof
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4512697/
https://www.ncbi.nlm.nih.gov/pubmed/26204119
http://dx.doi.org/10.1371/journal.pone.0133691
_version_ 1782382546624970752
author Overmars, Lex
Siezen, Roland J.
Francke, Christof
author_facet Overmars, Lex
Siezen, Roland J.
Francke, Christof
author_sort Overmars, Lex
collection PubMed
description The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to ‘flag’ TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking.
format Online
Article
Text
id pubmed-4512697
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45126972015-07-24 A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites Overmars, Lex Siezen, Roland J. Francke, Christof PLoS One Research Article The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to ‘flag’ TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking. Public Library of Science 2015-07-23 /pmc/articles/PMC4512697/ /pubmed/26204119 http://dx.doi.org/10.1371/journal.pone.0133691 Text en © 2015 Overmars et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Overmars, Lex
Siezen, Roland J.
Francke, Christof
A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title_full A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title_fullStr A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title_full_unstemmed A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title_short A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
title_sort novel quality measure and correction procedure for the annotation of microbial translation initiation sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4512697/
https://www.ncbi.nlm.nih.gov/pubmed/26204119
http://dx.doi.org/10.1371/journal.pone.0133691
work_keys_str_mv AT overmarslex anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT siezenrolandj anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT franckechristof anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT overmarslex novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT siezenrolandj novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites
AT franckechristof novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites