Cargando…

Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction

Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to d...

Descripción completa

Detalles Bibliográficos
Autores principales: Ederveen, Thomas H. A., Overmars, Lex, van Hijum, Sacha A. F. T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651085/
https://www.ncbi.nlm.nih.gov/pubmed/23675487
http://dx.doi.org/10.1371/journal.pone.0063523
_version_ 1782269161370550272
author Ederveen, Thomas H. A.
Overmars, Lex
van Hijum, Sacha A. F. T.
author_facet Ederveen, Thomas H. A.
Overmars, Lex
van Hijum, Sacha A. F. T.
author_sort Ederveen, Thomas H. A.
collection PubMed
description Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35–52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity.
format Online
Article
Text
id pubmed-3651085
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36510852013-05-14 Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction Ederveen, Thomas H. A. Overmars, Lex van Hijum, Sacha A. F. T. PLoS One Research Article Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35–52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity. Public Library of Science 2013-05-10 /pmc/articles/PMC3651085/ /pubmed/23675487 http://dx.doi.org/10.1371/journal.pone.0063523 Text en © 2013 Ederveen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ederveen, Thomas H. A.
Overmars, Lex
van Hijum, Sacha A. F. T.
Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title_full Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title_fullStr Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title_full_unstemmed Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title_short Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction
title_sort reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651085/
https://www.ncbi.nlm.nih.gov/pubmed/23675487
http://dx.doi.org/10.1371/journal.pone.0063523
work_keys_str_mv AT ederveenthomasha reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction
AT overmarslex reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction
AT vanhijumsachaaft reducemanualcurationbycombininggenepredictionsfrommultipleannotationenginesacasestudyofstartcodonprediction