Cargando…

SPUMONI 2: improved classification using a pangenome index of minimizer digests

Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMO...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahmed, Omar Y., Rossi, Massimiliano, Gagie, Travis, Boucher, Christina, Langmead, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197461/
https://www.ncbi.nlm.nih.gov/pubmed/37202771
http://dx.doi.org/10.1186/s13059-023-02958-1
Descripción
Sumario:Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2’s index is 65 times smaller than minimap2’s for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02958-1.