Cargando…

Impact of analytic provenance in genome analysis

BACKGROUND: Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages....

Descripción completa

Detalles Bibliográficos
Autores principales: Morrison, Shatavia S, Pyzh, Roman, Jeon, Myung S, Amaro, Carmen, Roig, Francisco J, Baker-Austin, Craig, Oliver, James D, Gibas, Cynthia J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248810/
https://www.ncbi.nlm.nih.gov/pubmed/25435180
http://dx.doi.org/10.1186/1471-2164-15-S8-S1
_version_ 1782346847178719232
author Morrison, Shatavia S
Pyzh, Roman
Jeon, Myung S
Amaro, Carmen
Roig, Francisco J
Baker-Austin, Craig
Oliver, James D
Gibas, Cynthia J
author_facet Morrison, Shatavia S
Pyzh, Roman
Jeon, Myung S
Amaro, Carmen
Roig, Francisco J
Baker-Austin, Craig
Oliver, James D
Gibas, Cynthia J
author_sort Morrison, Shatavia S
collection PubMed
description BACKGROUND: Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages. These choices have a direct impact on the biologically relevant products of a genomic analysis - for instance identification of common and differentiating regions among genomes in a comparison, or identification of enriched gene functional categories in a specific strain. Here, we examine the outcomes of different assembly and analysis steps in typical workflows in a comparison among strains of Vibrio vulnificus. RESULTS: Using six recently sequenced strains of V. vulnificus, we demonstrate the "alternate realities" of comparative genomics, and how they depend on the choice of a robust assembly method and accurate ab initio annotation. We apply several popular assemblers for paired-end Illumina data, and three well-regarded ab initio genefinders. We demonstrate significant differences in detected gene overlap among comparative genomics workflows that depend on these two steps. The divergence between workflows, even those using widely adopted methods, is obvious both at the single genome level and when a comparison is performed. In a typical example where multiple workflows are applied to the strain V. vulnificus CECT 4606, a workflow that uses the Velvet assembler and Glimmer gene finder identifies 3275 gene features, while a workflow that uses the Velvet assembler and the RAST annotation system identifies 5011 gene features. Only 3171 genes are identical between both workflows. When we examine 9 assembly/ annotation workflow scenarios as input to a three-way genome comparison, differentiating genes and even differentially represented functional categories change significantly from scenario to scenario. CONCLUSIONS: Inconsistencies in genomic analysis can arise depending on the choices that are made during the assembly and annotation stages. These inconsistencies can have a significant impact on the interpretation of an individual genome's content. The impact is multiplied when comparison of content and function among multiple genomes is the goal. Tracking the analysis history of the data - its analytic provenance - is critical for reproducible analysis of genome data.
format Online
Article
Text
id pubmed-4248810
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42488102014-12-04 Impact of analytic provenance in genome analysis Morrison, Shatavia S Pyzh, Roman Jeon, Myung S Amaro, Carmen Roig, Francisco J Baker-Austin, Craig Oliver, James D Gibas, Cynthia J BMC Genomics Proceedings BACKGROUND: Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages. These choices have a direct impact on the biologically relevant products of a genomic analysis - for instance identification of common and differentiating regions among genomes in a comparison, or identification of enriched gene functional categories in a specific strain. Here, we examine the outcomes of different assembly and analysis steps in typical workflows in a comparison among strains of Vibrio vulnificus. RESULTS: Using six recently sequenced strains of V. vulnificus, we demonstrate the "alternate realities" of comparative genomics, and how they depend on the choice of a robust assembly method and accurate ab initio annotation. We apply several popular assemblers for paired-end Illumina data, and three well-regarded ab initio genefinders. We demonstrate significant differences in detected gene overlap among comparative genomics workflows that depend on these two steps. The divergence between workflows, even those using widely adopted methods, is obvious both at the single genome level and when a comparison is performed. In a typical example where multiple workflows are applied to the strain V. vulnificus CECT 4606, a workflow that uses the Velvet assembler and Glimmer gene finder identifies 3275 gene features, while a workflow that uses the Velvet assembler and the RAST annotation system identifies 5011 gene features. Only 3171 genes are identical between both workflows. When we examine 9 assembly/ annotation workflow scenarios as input to a three-way genome comparison, differentiating genes and even differentially represented functional categories change significantly from scenario to scenario. CONCLUSIONS: Inconsistencies in genomic analysis can arise depending on the choices that are made during the assembly and annotation stages. These inconsistencies can have a significant impact on the interpretation of an individual genome's content. The impact is multiplied when comparison of content and function among multiple genomes is the goal. Tracking the analysis history of the data - its analytic provenance - is critical for reproducible analysis of genome data. BioMed Central 2014-11-13 /pmc/articles/PMC4248810/ /pubmed/25435180 http://dx.doi.org/10.1186/1471-2164-15-S8-S1 Text en Copyright © 2014 Morrison et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Morrison, Shatavia S
Pyzh, Roman
Jeon, Myung S
Amaro, Carmen
Roig, Francisco J
Baker-Austin, Craig
Oliver, James D
Gibas, Cynthia J
Impact of analytic provenance in genome analysis
title Impact of analytic provenance in genome analysis
title_full Impact of analytic provenance in genome analysis
title_fullStr Impact of analytic provenance in genome analysis
title_full_unstemmed Impact of analytic provenance in genome analysis
title_short Impact of analytic provenance in genome analysis
title_sort impact of analytic provenance in genome analysis
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248810/
https://www.ncbi.nlm.nih.gov/pubmed/25435180
http://dx.doi.org/10.1186/1471-2164-15-S8-S1
work_keys_str_mv AT morrisonshatavias impactofanalyticprovenanceingenomeanalysis
AT pyzhroman impactofanalyticprovenanceingenomeanalysis
AT jeonmyungs impactofanalyticprovenanceingenomeanalysis
AT amarocarmen impactofanalyticprovenanceingenomeanalysis
AT roigfranciscoj impactofanalyticprovenanceingenomeanalysis
AT bakeraustincraig impactofanalyticprovenanceingenomeanalysis
AT oliverjamesd impactofanalyticprovenanceingenomeanalysis
AT gibascynthiaj impactofanalyticprovenanceingenomeanalysis