Cargando…

Movi: a fast and cache-efficient full-text pangenome index

Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The mov...

Descripción completa

Detalles Bibliográficos
Autores principales: Zakeri, Mohsen, Brown, Nathaniel K., Ahmed, Omar Y., Gagie, Travis, Langmead, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635132/
https://www.ncbi.nlm.nih.gov/pubmed/37961660
http://dx.doi.org/10.1101/2023.11.04.565615
_version_ 1785146293848375296
author Zakeri, Mohsen
Brown, Nathaniel K.
Ahmed, Omar Y.
Gagie, Travis
Langmead, Ben
author_facet Zakeri, Mohsen
Brown, Nathaniel K.
Ahmed, Omar Y.
Gagie, Travis
Langmead, Ben
author_sort Zakeri, Mohsen
collection PubMed
description Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval.
format Online
Article
Text
id pubmed-10635132
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106351322023-11-13 Movi: a fast and cache-efficient full-text pangenome index Zakeri, Mohsen Brown, Nathaniel K. Ahmed, Omar Y. Gagie, Travis Langmead, Ben bioRxiv Article Efficient pangenome indexes are promising tools for many applications, including rapid classification of nanopore sequencing reads. Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute sophisticated matching queries needed for classification – such as pseudo-matching lengths – at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made in a small and predictable time interval. Cold Spring Harbor Laboratory 2023-11-05 /pmc/articles/PMC10635132/ /pubmed/37961660 http://dx.doi.org/10.1101/2023.11.04.565615 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Zakeri, Mohsen
Brown, Nathaniel K.
Ahmed, Omar Y.
Gagie, Travis
Langmead, Ben
Movi: a fast and cache-efficient full-text pangenome index
title Movi: a fast and cache-efficient full-text pangenome index
title_full Movi: a fast and cache-efficient full-text pangenome index
title_fullStr Movi: a fast and cache-efficient full-text pangenome index
title_full_unstemmed Movi: a fast and cache-efficient full-text pangenome index
title_short Movi: a fast and cache-efficient full-text pangenome index
title_sort movi: a fast and cache-efficient full-text pangenome index
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635132/
https://www.ncbi.nlm.nih.gov/pubmed/37961660
http://dx.doi.org/10.1101/2023.11.04.565615
work_keys_str_mv AT zakerimohsen moviafastandcacheefficientfulltextpangenomeindex
AT brownnathanielk moviafastandcacheefficientfulltextpangenomeindex
AT ahmedomary moviafastandcacheefficientfulltextpangenomeindex
AT gagietravis moviafastandcacheefficientfulltextpangenomeindex
AT langmeadben moviafastandcacheefficientfulltextpangenomeindex