Cargando…

Automated ensemble assembly and validation of microbial genomes

BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practic...

Descripción completa

Detalles Bibliográficos
Autores principales: Koren, Sergey, Treangen, Todd J, Hill, Christopher M, Pop, Mihai, Phillippy, Adam M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030574/
https://www.ncbi.nlm.nih.gov/pubmed/24884846
http://dx.doi.org/10.1186/1471-2105-15-126
_version_ 1782317407017107456
author Koren, Sergey
Treangen, Todd J
Hill, Christopher M
Pop, Mihai
Phillippy, Adam M
author_facet Koren, Sergey
Treangen, Todd J
Hill, Christopher M
Pop, Mihai
Phillippy, Adam M
author_sort Koren, Sergey
collection PubMed
description BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. RESULTS: To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. CONCLUSIONS: Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
format Online
Article
Text
id pubmed-4030574
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40305742014-05-23 Automated ensemble assembly and validation of microbial genomes Koren, Sergey Treangen, Todd J Hill, Christopher M Pop, Mihai Phillippy, Adam M BMC Bioinformatics Software BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. RESULTS: To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. CONCLUSIONS: Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs. BioMed Central 2014-05-03 /pmc/articles/PMC4030574/ /pubmed/24884846 http://dx.doi.org/10.1186/1471-2105-15-126 Text en Copyright © 2014 Koren et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Koren, Sergey
Treangen, Todd J
Hill, Christopher M
Pop, Mihai
Phillippy, Adam M
Automated ensemble assembly and validation of microbial genomes
title Automated ensemble assembly and validation of microbial genomes
title_full Automated ensemble assembly and validation of microbial genomes
title_fullStr Automated ensemble assembly and validation of microbial genomes
title_full_unstemmed Automated ensemble assembly and validation of microbial genomes
title_short Automated ensemble assembly and validation of microbial genomes
title_sort automated ensemble assembly and validation of microbial genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030574/
https://www.ncbi.nlm.nih.gov/pubmed/24884846
http://dx.doi.org/10.1186/1471-2105-15-126
work_keys_str_mv AT korensergey automatedensembleassemblyandvalidationofmicrobialgenomes
AT treangentoddj automatedensembleassemblyandvalidationofmicrobialgenomes
AT hillchristopherm automatedensembleassemblyandvalidationofmicrobialgenomes
AT popmihai automatedensembleassemblyandvalidationofmicrobialgenomes
AT phillippyadamm automatedensembleassemblyandvalidationofmicrobialgenomes