Cargando…
Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both type...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538363/ https://www.ncbi.nlm.nih.gov/pubmed/37253540 http://dx.doi.org/10.1101/gr.277615.122 |
_version_ | 1785113308279341056 |
---|---|
author | Cracco, Andrea Tomescu, Alexandru I. |
author_facet | Cracco, Andrea Tomescu, Alexandru I. |
author_sort | Cracco, Andrea |
collection | PubMed |
description | Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3× to 21× compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5× to 39× compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs. |
format | Online Article Text |
id | pubmed-10538363 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105383632023-09-29 Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT Cracco, Andrea Tomescu, Alexandru I. Genome Res Methods Compacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted de Bruijn graphs are a variant built on a collection of sequences and associate to each k-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging the k-mer counting step with the unitig construction step, as well as on numerous practical optimizations. For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3× to 21× compared with the state-of-the-art tool Cuttlefish 2. When constructing the colored variant, GGCAT achieves speed-ups of 5× to 39× compared with the state-of-the-art tool BiFrost. Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538363/ /pubmed/37253540 http://dx.doi.org/10.1101/gr.277615.122 Text en © 2023 Cracco and Tomescu; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods Cracco, Andrea Tomescu, Alexandru I. Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title_full | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title_fullStr | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title_full_unstemmed | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title_short | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT |
title_sort | extremely fast construction and querying of compacted and colored de bruijn graphs with ggcat |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538363/ https://www.ncbi.nlm.nih.gov/pubmed/37253540 http://dx.doi.org/10.1101/gr.277615.122 |
work_keys_str_mv | AT craccoandrea extremelyfastconstructionandqueryingofcompactedandcoloreddebruijngraphswithggcat AT tomescualexandrui extremelyfastconstructionandqueryingofcompactedandcoloreddebruijngraphswithggcat |