Cargando…

Buffering updates enables efficient dynamic de Bruijn graphs

MOTIVATION: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection...

Descripción completa

Detalles Bibliográficos
Autores principales: Alanko, Jarno, Alipanahi, Bahar, Settle, Jonathen, Boucher, Christina, Gagie, Travis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8326735/
https://www.ncbi.nlm.nih.gov/pubmed/34377371
http://dx.doi.org/10.1016/j.csbj.2021.06.047
_version_ 1783731901593288704
author Alanko, Jarno
Alipanahi, Bahar
Settle, Jonathen
Boucher, Christina
Gagie, Travis
author_facet Alanko, Jarno
Alipanahi, Bahar
Settle, Jonathen
Boucher, Christina
Gagie, Travis
author_sort Alanko, Jarno
collection PubMed
description MOTIVATION: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. RESULTS: With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude.
format Online
Article
Text
id pubmed-8326735
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-83267352021-08-09 Buffering updates enables efficient dynamic de Bruijn graphs Alanko, Jarno Alipanahi, Bahar Settle, Jonathen Boucher, Christina Gagie, Travis Comput Struct Biotechnol J Research Article MOTIVATION: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. RESULTS: With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude. Research Network of Computational and Structural Biotechnology 2021-07-06 /pmc/articles/PMC8326735/ /pubmed/34377371 http://dx.doi.org/10.1016/j.csbj.2021.06.047 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Alanko, Jarno
Alipanahi, Bahar
Settle, Jonathen
Boucher, Christina
Gagie, Travis
Buffering updates enables efficient dynamic de Bruijn graphs
title Buffering updates enables efficient dynamic de Bruijn graphs
title_full Buffering updates enables efficient dynamic de Bruijn graphs
title_fullStr Buffering updates enables efficient dynamic de Bruijn graphs
title_full_unstemmed Buffering updates enables efficient dynamic de Bruijn graphs
title_short Buffering updates enables efficient dynamic de Bruijn graphs
title_sort buffering updates enables efficient dynamic de bruijn graphs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8326735/
https://www.ncbi.nlm.nih.gov/pubmed/34377371
http://dx.doi.org/10.1016/j.csbj.2021.06.047
work_keys_str_mv AT alankojarno bufferingupdatesenablesefficientdynamicdebruijngraphs
AT alipanahibahar bufferingupdatesenablesefficientdynamicdebruijngraphs
AT settlejonathen bufferingupdatesenablesefficientdynamicdebruijngraphs
AT boucherchristina bufferingupdatesenablesefficientdynamicdebruijngraphs
AT gagietravis bufferingupdatesenablesefficientdynamicdebruijngraphs