Cargando…

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

MOTIVATION: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large data...

Descripción completa

Detalles Bibliográficos
Autores principales: Marchet, Camille, Iqbal, Zamin, Gautheret, Daniel, Salson, Mikaël, Chikhi, Rayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355249/
https://www.ncbi.nlm.nih.gov/pubmed/32657392
http://dx.doi.org/10.1093/bioinformatics/btaa487
Descripción
Sumario:MOTIVATION: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. RESULTS: We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. AVAILABILITY AND IMPLEMENTATION: https://github.com/kamimrcht/REINDEER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.