Cargando…

GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers

De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jünemann, Sebastian, Prior, Karola, Albersmeier, Andreas, Albaum, Stefan, Kalinowski, Jörn, Goesmann, Alexander, Stoye, Jens, Harmsen, Dag
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157817/
https://www.ncbi.nlm.nih.gov/pubmed/25198770
http://dx.doi.org/10.1371/journal.pone.0107014
_version_ 1782333940029194240
author Jünemann, Sebastian
Prior, Karola
Albersmeier, Andreas
Albaum, Stefan
Kalinowski, Jörn
Goesmann, Alexander
Stoye, Jens
Harmsen, Dag
author_facet Jünemann, Sebastian
Prior, Karola
Albersmeier, Andreas
Albaum, Stefan
Kalinowski, Jörn
Goesmann, Alexander
Stoye, Jens
Harmsen, Dag
author_sort Jünemann, Sebastian
collection PubMed
description De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.
format Online
Article
Text
id pubmed-4157817
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41578172014-09-09 GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers Jünemann, Sebastian Prior, Karola Albersmeier, Andreas Albaum, Stefan Kalinowski, Jörn Goesmann, Alexander Stoye, Jens Harmsen, Dag PLoS One Research Article De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB. Public Library of Science 2014-09-08 /pmc/articles/PMC4157817/ /pubmed/25198770 http://dx.doi.org/10.1371/journal.pone.0107014 Text en © 2014 Jünemann et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Jünemann, Sebastian
Prior, Karola
Albersmeier, Andreas
Albaum, Stefan
Kalinowski, Jörn
Goesmann, Alexander
Stoye, Jens
Harmsen, Dag
GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title_full GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title_fullStr GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title_full_unstemmed GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title_short GABenchToB: A Genome Assembly Benchmark Tuned on Bacteria and Benchtop Sequencers
title_sort gabenchtob: a genome assembly benchmark tuned on bacteria and benchtop sequencers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157817/
https://www.ncbi.nlm.nih.gov/pubmed/25198770
http://dx.doi.org/10.1371/journal.pone.0107014
work_keys_str_mv AT junemannsebastian gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT priorkarola gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT albersmeierandreas gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT albaumstefan gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT kalinowskijorn gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT goesmannalexander gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT stoyejens gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers
AT harmsendag gabenchtobagenomeassemblybenchmarktunedonbacteriaandbenchtopsequencers