Cargando…

Toward a standard in structural genome annotation for prokaryotes

BACKGROUND: In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, 1,004,576 peptides were collected from various publicly available resources, and were used as a basis to evaluate various gene-calling meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripp, H. James, Sutton, Granger, White, Owen, Wortman, Jennifer, Pati, Amrita, Mikhailova, Natalia, Ovchinnikova, Galina, Payne, Samuel H., Kyrpides, Nikos C., Ivanova, Natalia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4572445/
https://www.ncbi.nlm.nih.gov/pubmed/26380633
http://dx.doi.org/10.1186/s40793-015-0034-9
_version_ 1782390400636420096
author Tripp, H. James
Sutton, Granger
White, Owen
Wortman, Jennifer
Pati, Amrita
Mikhailova, Natalia
Ovchinnikova, Galina
Payne, Samuel H.
Kyrpides, Nikos C.
Ivanova, Natalia
author_facet Tripp, H. James
Sutton, Granger
White, Owen
Wortman, Jennifer
Pati, Amrita
Mikhailova, Natalia
Ovchinnikova, Galina
Payne, Samuel H.
Kyrpides, Nikos C.
Ivanova, Natalia
author_sort Tripp, H. James
collection PubMed
description BACKGROUND: In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, 1,004,576 peptides were collected from various publicly available resources, and were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. RESULTS: We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. CONCLUSIONS: A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of exprimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40793-015-0034-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4572445
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45724452015-09-18 Toward a standard in structural genome annotation for prokaryotes Tripp, H. James Sutton, Granger White, Owen Wortman, Jennifer Pati, Amrita Mikhailova, Natalia Ovchinnikova, Galina Payne, Samuel H. Kyrpides, Nikos C. Ivanova, Natalia Stand Genomic Sci Research BACKGROUND: In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, 1,004,576 peptides were collected from various publicly available resources, and were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. RESULTS: We found that the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. CONCLUSIONS: A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of exprimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40793-015-0034-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-25 /pmc/articles/PMC4572445/ /pubmed/26380633 http://dx.doi.org/10.1186/s40793-015-0034-9 Text en © Tripp et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Tripp, H. James
Sutton, Granger
White, Owen
Wortman, Jennifer
Pati, Amrita
Mikhailova, Natalia
Ovchinnikova, Galina
Payne, Samuel H.
Kyrpides, Nikos C.
Ivanova, Natalia
Toward a standard in structural genome annotation for prokaryotes
title Toward a standard in structural genome annotation for prokaryotes
title_full Toward a standard in structural genome annotation for prokaryotes
title_fullStr Toward a standard in structural genome annotation for prokaryotes
title_full_unstemmed Toward a standard in structural genome annotation for prokaryotes
title_short Toward a standard in structural genome annotation for prokaryotes
title_sort toward a standard in structural genome annotation for prokaryotes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4572445/
https://www.ncbi.nlm.nih.gov/pubmed/26380633
http://dx.doi.org/10.1186/s40793-015-0034-9
work_keys_str_mv AT tripphjames towardastandardinstructuralgenomeannotationforprokaryotes
AT suttongranger towardastandardinstructuralgenomeannotationforprokaryotes
AT whiteowen towardastandardinstructuralgenomeannotationforprokaryotes
AT wortmanjennifer towardastandardinstructuralgenomeannotationforprokaryotes
AT patiamrita towardastandardinstructuralgenomeannotationforprokaryotes
AT mikhailovanatalia towardastandardinstructuralgenomeannotationforprokaryotes
AT ovchinnikovagalina towardastandardinstructuralgenomeannotationforprokaryotes
AT paynesamuelh towardastandardinstructuralgenomeannotationforprokaryotes
AT kyrpidesnikosc towardastandardinstructuralgenomeannotationforprokaryotes
AT ivanovanatalia towardastandardinstructuralgenomeannotationforprokaryotes