Cargando…

An Integrated Pipeline for de Novo Assembly of Microbial Genomes

Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tritt, Andrew, Eisen, Jonathan A., Facciotti, Marc T., Darling, Aaron E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441570/
https://www.ncbi.nlm.nih.gov/pubmed/23028432
http://dx.doi.org/10.1371/journal.pone.0042304
_version_ 1782243321327910912
author Tritt, Andrew
Eisen, Jonathan A.
Facciotti, Marc T.
Darling, Aaron E.
author_facet Tritt, Andrew
Eisen, Jonathan A.
Facciotti, Marc T.
Darling, Aaron E.
author_sort Tritt, Andrew
collection PubMed
description Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
format Online
Article
Text
id pubmed-3441570
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34415702012-10-01 An Integrated Pipeline for de Novo Assembly of Microbial Genomes Tritt, Andrew Eisen, Jonathan A. Facciotti, Marc T. Darling, Aaron E. PLoS One Research Article Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage. Public Library of Science 2012-09-13 /pmc/articles/PMC3441570/ /pubmed/23028432 http://dx.doi.org/10.1371/journal.pone.0042304 Text en © 2012 Tritt et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Tritt, Andrew
Eisen, Jonathan A.
Facciotti, Marc T.
Darling, Aaron E.
An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title_full An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title_fullStr An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title_full_unstemmed An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title_short An Integrated Pipeline for de Novo Assembly of Microbial Genomes
title_sort integrated pipeline for de novo assembly of microbial genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441570/
https://www.ncbi.nlm.nih.gov/pubmed/23028432
http://dx.doi.org/10.1371/journal.pone.0042304
work_keys_str_mv AT trittandrew anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT eisenjonathana anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT facciottimarct anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT darlingaarone anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT trittandrew integratedpipelinefordenovoassemblyofmicrobialgenomes
AT eisenjonathana integratedpipelinefordenovoassemblyofmicrobialgenomes
AT facciottimarct integratedpipelinefordenovoassemblyofmicrobialgenomes
AT darlingaarone integratedpipelinefordenovoassemblyofmicrobialgenomes