Cargando…
A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites
The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any partic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4512697/ https://www.ncbi.nlm.nih.gov/pubmed/26204119 http://dx.doi.org/10.1371/journal.pone.0133691 |
_version_ | 1782382546624970752 |
---|---|
author | Overmars, Lex Siezen, Roland J. Francke, Christof |
author_facet | Overmars, Lex Siezen, Roland J. Francke, Christof |
author_sort | Overmars, Lex |
collection | PubMed |
description | The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to ‘flag’ TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking. |
format | Online Article Text |
id | pubmed-4512697 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45126972015-07-24 A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites Overmars, Lex Siezen, Roland J. Francke, Christof PLoS One Research Article The identification of translation initiation sites (TISs) constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to ‘flag’ TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking. Public Library of Science 2015-07-23 /pmc/articles/PMC4512697/ /pubmed/26204119 http://dx.doi.org/10.1371/journal.pone.0133691 Text en © 2015 Overmars et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Overmars, Lex Siezen, Roland J. Francke, Christof A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title | A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title_full | A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title_fullStr | A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title_full_unstemmed | A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title_short | A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites |
title_sort | novel quality measure and correction procedure for the annotation of microbial translation initiation sites |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4512697/ https://www.ncbi.nlm.nih.gov/pubmed/26204119 http://dx.doi.org/10.1371/journal.pone.0133691 |
work_keys_str_mv | AT overmarslex anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites AT siezenrolandj anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites AT franckechristof anovelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites AT overmarslex novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites AT siezenrolandj novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites AT franckechristof novelqualitymeasureandcorrectionprocedurefortheannotationofmicrobialtranslationinitiationsites |