Cargando…

Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT

Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both type...

Descripción completa

Detalles Bibliográficos
Autores principales: Cracco, Andrea, Tomescu, Alexandru I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538363/
https://www.ncbi.nlm.nih.gov/pubmed/37253540
http://dx.doi.org/10.1101/gr.277615.122
_version_ 1785113308279341056
author Cracco, Andrea
Tomescu, Alexandru I.
author_facet Cracco, Andrea
Tomescu, Alexandru I.
author_sort Cracco, Andrea
collection PubMed
description Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3× to 21× compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5× to 39× compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs.
format Online
Article
Text
id pubmed-10538363
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-105383632023-09-29 Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT Cracco, Andrea Tomescu, Alexandru I. Genome Res Methods Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3× to 21× compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5× to 39× compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538363/ /pubmed/37253540 http://dx.doi.org/10.1101/gr.277615.122 Text en © 2023 Cracco and Tomescu; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Cracco, Andrea
Tomescu, Alexandru I.
Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title_full Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title_fullStr Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title_full_unstemmed Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title_short Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
title_sort extremely fast construction and querying of compacted and colored de bruijn graphs with ggcat
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538363/
https://www.ncbi.nlm.nih.gov/pubmed/37253540
http://dx.doi.org/10.1101/gr.277615.122
work_keys_str_mv AT craccoandrea extremelyfastconstructionandqueryingofcompactedandcoloreddebruijngraphswithggcat
AT tomescualexandrui extremelyfastconstructionandqueryingofcompactedandcoloreddebruijngraphswithggcat