Cargando…

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful expe...

Descripción completa

Detalles Bibliográficos
Autores principales: Darvish, Mitra, Seiler, Enrico, Mehringer, Svenja, Rahn, René, Reinert, Knut
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438961/
https://www.ncbi.nlm.nih.gov/pubmed/35801930
http://dx.doi.org/10.1093/bioinformatics/btac492
_version_ 1784781943139729408
author Darvish, Mitra
Seiler, Enrico
Mehringer, Svenja
Rahn, René
Reinert, Knut
author_facet Darvish, Mitra
Seiler, Enrico
Mehringer, Svenja
Rahn, René
Reinert, Knut
author_sort Darvish, Mitra
collection PubMed
description MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9438961
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94389612022-09-06 Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments Darvish, Mitra Seiler, Enrico Mehringer, Svenja Rahn, René Reinert, Knut Bioinformatics Original Papers MOTIVATION: The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS: As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION: https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-08 /pmc/articles/PMC9438961/ /pubmed/35801930 http://dx.doi.org/10.1093/bioinformatics/btac492 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Darvish, Mitra
Seiler, Enrico
Mehringer, Svenja
Rahn, René
Reinert, Knut
Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title_full Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title_fullStr Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title_full_unstemmed Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title_short Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
title_sort needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438961/
https://www.ncbi.nlm.nih.gov/pubmed/35801930
http://dx.doi.org/10.1093/bioinformatics/btac492
work_keys_str_mv AT darvishmitra needleafastandspaceefficientprefilterforestimatingthequantificationofverylargecollectionsofexpressionexperiments
AT seilerenrico needleafastandspaceefficientprefilterforestimatingthequantificationofverylargecollectionsofexpressionexperiments
AT mehringersvenja needleafastandspaceefficientprefilterforestimatingthequantificationofverylargecollectionsofexpressionexperiments
AT rahnrene needleafastandspaceefficientprefilterforestimatingthequantificationofverylargecollectionsofexpressionexperiments
AT reinertknut needleafastandspaceefficientprefilterforestimatingthequantificationofverylargecollectionsofexpressionexperiments