Cargando…
An Integrated Pipeline for de Novo Assembly of Microbial Genomes
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441570/ https://www.ncbi.nlm.nih.gov/pubmed/23028432 http://dx.doi.org/10.1371/journal.pone.0042304 |
_version_ | 1782243321327910912 |
---|---|
author | Tritt, Andrew Eisen, Jonathan A. Facciotti, Marc T. Darling, Aaron E. |
author_facet | Tritt, Andrew Eisen, Jonathan A. Facciotti, Marc T. Darling, Aaron E. |
author_sort | Tritt, Andrew |
collection | PubMed |
description | Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage. |
format | Online Article Text |
id | pubmed-3441570 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34415702012-10-01 An Integrated Pipeline for de Novo Assembly of Microbial Genomes Tritt, Andrew Eisen, Jonathan A. Facciotti, Marc T. Darling, Aaron E. PLoS One Research Article Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage. Public Library of Science 2012-09-13 /pmc/articles/PMC3441570/ /pubmed/23028432 http://dx.doi.org/10.1371/journal.pone.0042304 Text en © 2012 Tritt et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Tritt, Andrew Eisen, Jonathan A. Facciotti, Marc T. Darling, Aaron E. An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title | An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title_full | An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title_fullStr | An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title_full_unstemmed | An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title_short | An Integrated Pipeline for de Novo Assembly of Microbial Genomes |
title_sort | integrated pipeline for de novo assembly of microbial genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441570/ https://www.ncbi.nlm.nih.gov/pubmed/23028432 http://dx.doi.org/10.1371/journal.pone.0042304 |
work_keys_str_mv | AT trittandrew anintegratedpipelinefordenovoassemblyofmicrobialgenomes AT eisenjonathana anintegratedpipelinefordenovoassemblyofmicrobialgenomes AT facciottimarct anintegratedpipelinefordenovoassemblyofmicrobialgenomes AT darlingaarone anintegratedpipelinefordenovoassemblyofmicrobialgenomes AT trittandrew integratedpipelinefordenovoassemblyofmicrobialgenomes AT eisenjonathana integratedpipelinefordenovoassemblyofmicrobialgenomes AT facciottimarct integratedpipelinefordenovoassemblyofmicrobialgenomes AT darlingaarone integratedpipelinefordenovoassemblyofmicrobialgenomes |