Cargando…

Faucet: streaming de novo assembly graph construction

MOTIVATION: We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this func...

Descripción completa

Detalles Bibliográficos
Autores principales: Rozov, Roye, Goldshlager, Gil, Halperin, Eran, Shamir, Ron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870852/
https://www.ncbi.nlm.nih.gov/pubmed/29036597
http://dx.doi.org/10.1093/bioinformatics/btx471
_version_ 1783309557129281536
author Rozov, Roye
Goldshlager, Gil
Halperin, Eran
Shamir, Ron
author_facet Rozov, Roye
Goldshlager, Gil
Halperin, Eran
Shamir, Ron
author_sort Rozov, Roye
collection PubMed
description MOTIVATION: We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. RESULTS: Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. AVAILABILITY AND IMPLEMENTATION: Faucet is available at https://github.com/Shamir-Lab/Faucet SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870852
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58708522018-03-29 Faucet: streaming de novo assembly graph construction Rozov, Roye Goldshlager, Gil Halperin, Eran Shamir, Ron Bioinformatics Special Issue: Recomb-Seq/Recomb-Ccb/Recomb-Cg MOTIVATION: We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. RESULTS: Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. AVAILABILITY AND IMPLEMENTATION: Faucet is available at https://github.com/Shamir-Lab/Faucet SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-01-01 2017-07-24 /pmc/articles/PMC5870852/ /pubmed/29036597 http://dx.doi.org/10.1093/bioinformatics/btx471 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Special Issue: Recomb-Seq/Recomb-Ccb/Recomb-Cg
Rozov, Roye
Goldshlager, Gil
Halperin, Eran
Shamir, Ron
Faucet: streaming de novo assembly graph construction
title Faucet: streaming de novo assembly graph construction
title_full Faucet: streaming de novo assembly graph construction
title_fullStr Faucet: streaming de novo assembly graph construction
title_full_unstemmed Faucet: streaming de novo assembly graph construction
title_short Faucet: streaming de novo assembly graph construction
title_sort faucet: streaming de novo assembly graph construction
topic Special Issue: Recomb-Seq/Recomb-Ccb/Recomb-Cg
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870852/
https://www.ncbi.nlm.nih.gov/pubmed/29036597
http://dx.doi.org/10.1093/bioinformatics/btx471
work_keys_str_mv AT rozovroye faucetstreamingdenovoassemblygraphconstruction
AT goldshlagergil faucetstreamingdenovoassemblygraphconstruction
AT halperineran faucetstreamingdenovoassemblygraphconstruction
AT shamirron faucetstreamingdenovoassemblygraphconstruction