Cargando…

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advanc...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Jamshed, Kokot, Marek, Deorowicz, Sebastian, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454175/
https://www.ncbi.nlm.nih.gov/pubmed/36076275
http://dx.doi.org/10.1186/s13059-022-02743-6
Descripción
Sumario:The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17–23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54–58 h, using considerably more memory. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-022-02743-6).