Cargando…

BBMerge – Accurate paired shotgun read merging via overlap

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume...

Descripción completa

Detalles Bibliográficos
Autores principales: Bushnell, Brian, Rood, Jonathan, Singer, Esther
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657622/
https://www.ncbi.nlm.nih.gov/pubmed/29073143
http://dx.doi.org/10.1371/journal.pone.0185056
_version_ 1783273850136428544
author Bushnell, Brian
Rood, Jonathan
Singer, Esther
author_facet Bushnell, Brian
Rood, Jonathan
Singer, Esther
author_sort Bushnell, Brian
collection PubMed
description Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.
format Online
Article
Text
id pubmed-5657622
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56576222017-11-09 BBMerge – Accurate paired shotgun read merging via overlap Bushnell, Brian Rood, Jonathan Singer, Esther PLoS One Research Article Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy. Public Library of Science 2017-10-26 /pmc/articles/PMC5657622/ /pubmed/29073143 http://dx.doi.org/10.1371/journal.pone.0185056 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Bushnell, Brian
Rood, Jonathan
Singer, Esther
BBMerge – Accurate paired shotgun read merging via overlap
title BBMerge – Accurate paired shotgun read merging via overlap
title_full BBMerge – Accurate paired shotgun read merging via overlap
title_fullStr BBMerge – Accurate paired shotgun read merging via overlap
title_full_unstemmed BBMerge – Accurate paired shotgun read merging via overlap
title_short BBMerge – Accurate paired shotgun read merging via overlap
title_sort bbmerge – accurate paired shotgun read merging via overlap
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657622/
https://www.ncbi.nlm.nih.gov/pubmed/29073143
http://dx.doi.org/10.1371/journal.pone.0185056
work_keys_str_mv AT bushnellbrian bbmergeaccuratepairedshotgunreadmergingviaoverlap
AT roodjonathan bbmergeaccuratepairedshotgunreadmergingviaoverlap
AT singeresther bbmergeaccuratepairedshotgunreadmergingviaoverlap