Cargando…
Automated ensemble assembly and validation of microbial genomes
BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practic...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030574/ https://www.ncbi.nlm.nih.gov/pubmed/24884846 http://dx.doi.org/10.1186/1471-2105-15-126 |
_version_ | 1782317407017107456 |
---|---|
author | Koren, Sergey Treangen, Todd J Hill, Christopher M Pop, Mihai Phillippy, Adam M |
author_facet | Koren, Sergey Treangen, Todd J Hill, Christopher M Pop, Mihai Phillippy, Adam M |
author_sort | Koren, Sergey |
collection | PubMed |
description | BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. RESULTS: To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. CONCLUSIONS: Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs. |
format | Online Article Text |
id | pubmed-4030574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40305742014-05-23 Automated ensemble assembly and validation of microbial genomes Koren, Sergey Treangen, Todd J Hill, Christopher M Pop, Mihai Phillippy, Adam M BMC Bioinformatics Software BACKGROUND: The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. RESULTS: To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. CONCLUSIONS: Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs. BioMed Central 2014-05-03 /pmc/articles/PMC4030574/ /pubmed/24884846 http://dx.doi.org/10.1186/1471-2105-15-126 Text en Copyright © 2014 Koren et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Koren, Sergey Treangen, Todd J Hill, Christopher M Pop, Mihai Phillippy, Adam M Automated ensemble assembly and validation of microbial genomes |
title | Automated ensemble assembly and validation of microbial genomes |
title_full | Automated ensemble assembly and validation of microbial genomes |
title_fullStr | Automated ensemble assembly and validation of microbial genomes |
title_full_unstemmed | Automated ensemble assembly and validation of microbial genomes |
title_short | Automated ensemble assembly and validation of microbial genomes |
title_sort | automated ensemble assembly and validation of microbial genomes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4030574/ https://www.ncbi.nlm.nih.gov/pubmed/24884846 http://dx.doi.org/10.1186/1471-2105-15-126 |
work_keys_str_mv | AT korensergey automatedensembleassemblyandvalidationofmicrobialgenomes AT treangentoddj automatedensembleassemblyandvalidationofmicrobialgenomes AT hillchristopherm automatedensembleassemblyandvalidationofmicrobialgenomes AT popmihai automatedensembleassemblyandvalidationofmicrobialgenomes AT phillippyadamm automatedensembleassemblyandvalidationofmicrobialgenomes |