Cargando…

Simplitigs as an efficient and scalable representation of de Bruijn graphs

de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and...

Descripción completa

Detalles Bibliográficos
Autores principales: Břinda, Karel, Baym, Michael, Kucherov, Gregory
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8025321/
https://www.ncbi.nlm.nih.gov/pubmed/33823902
http://dx.doi.org/10.1186/s13059-021-02297-z
Descripción
Sumario:de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02297-z.