Cargando…
Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences
We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8313605/ https://www.ncbi.nlm.nih.gov/pubmed/34337360 http://dx.doi.org/10.1016/j.isci.2021.102782 |
_version_ | 1783729385251012608 |
---|---|
author | Seiler, Enrico Mehringer, Svenja Darvish, Mitra Turc, Etienne Reinert, Knut |
author_facet | Seiler, Enrico Mehringer, Svenja Darvish, Mitra Turc, Etienne Reinert, Knut |
author_sort | Seiler, Enrico |
collection | PubMed |
description | We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara. |
format | Online Article Text |
id | pubmed-8313605 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-83136052021-07-31 Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences Seiler, Enrico Mehringer, Svenja Darvish, Mitra Turc, Etienne Reinert, Knut iScience Article We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara. Elsevier 2021-06-24 /pmc/articles/PMC8313605/ /pubmed/34337360 http://dx.doi.org/10.1016/j.isci.2021.102782 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Seiler, Enrico Mehringer, Svenja Darvish, Mitra Turc, Etienne Reinert, Knut Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title | Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title_full | Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title_fullStr | Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title_full_unstemmed | Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title_short | Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
title_sort | raptor: a fast and space-efficient pre-filter for querying very large collections of nucleotide sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8313605/ https://www.ncbi.nlm.nih.gov/pubmed/34337360 http://dx.doi.org/10.1016/j.isci.2021.102782 |
work_keys_str_mv | AT seilerenrico raptorafastandspaceefficientprefilterforqueryingverylargecollectionsofnucleotidesequences AT mehringersvenja raptorafastandspaceefficientprefilterforqueryingverylargecollectionsofnucleotidesequences AT darvishmitra raptorafastandspaceefficientprefilterforqueryingverylargecollectionsofnucleotidesequences AT turcetienne raptorafastandspaceefficientprefilterforqueryingverylargecollectionsofnucleotidesequences AT reinertknut raptorafastandspaceefficientprefilterforqueryingverylargecollectionsofnucleotidesequences |