Cargando…

Partitioned Interleaved Bloom filters using Optane DC Persistent Memory

The recent improvements of full genome sequencing technologies, commonly subsumed under the term NGS (Next Generation Sequencing), have tremendously increased the sequencing throughput. Within 10 years it rose from 21 billion base pairs collected over months to about 400 billion bas...

Descripción completa

Detalles Bibliográficos
Autor principal:	Seiler, Enrico
Lenguaje:	eng
Publicado:	2019
Materias:	other events or meetings
Acceso en línea:	http://cds.cern.ch/record/2691435

_version_	1780963853954187264
author	Seiler, Enrico
author_facet	Seiler, Enrico
author_sort	Seiler, Enrico
collection	CERN
description	<!--HTML-->The recent improvements of full genome sequencing technologies, commonly subsumed under the term NGS (Next Generation Sequencing), have tremendously increased the sequencing throughput. Within 10 years it rose from 21 billion base pairs collected over months to about 400 billion base pairs per day (current throughput of Illumina's HiSeq 4000). The costs for producing one million base pairs could also be reduced from 140,000 dollars to a few cents. As a result of this dramatic development, the number of new data submissions, generated by various biotechnological protocols (ChIP-Seq, RNA-Seq, etc.), to genomic databases has grown dramatically and is expected to continue to increase faster than the cost and capacity of storage devices will decrease. The main task in analyzing NGS data is to search sequencing reads or short sequence patterns (i.e. exon/intron boundary read-through patterns) or expression profiles in large collections of sequences (i.e. a database). Searching the entirety of such databases mentioned above is usually only possible by searching the metadata or a set of results initially obtained from the experiment. Searching (approximately) for specific genomic sequence in all the data has not been possible in reasonable computational time. In this work we describe results of our new data structure, called binning directory that can distribute approximate search queries based on an extension of our recently introduced Interleaved Bloom Filters (IBF) called x-partitioned IBF (x-PIBF). The results presented here make use of Intel's Optane DC Persistent Memory architecture and achieves significant speedups compared to a disk based solution.
id	cern-2691435
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2019
record_format	invenio
spelling	cern-26914352022-11-02T22:24:40Zhttp://cds.cern.ch/record/2691435engSeiler, EnricoPartitioned Interleaved Bloom filters using Optane DC Persistent MemoryIXPUG 2019 Annual Conference at CERNother events or meetings<!--HTML-->The recent improvements of full genome sequencing technologies, commonly subsumed under the term NGS (Next Generation Sequencing), have tremendously increased the sequencing throughput. Within 10 years it rose from 21 billion base pairs collected over months to about 400 billion base pairs per day (current throughput of Illumina's HiSeq 4000). The costs for producing one million base pairs could also be reduced from 140,000 dollars to a few cents. As a result of this dramatic development, the number of new data submissions, generated by various biotechnological protocols (ChIP-Seq, RNA-Seq, etc.), to genomic databases has grown dramatically and is expected to continue to increase faster than the cost and capacity of storage devices will decrease. The main task in analyzing NGS data is to search sequencing reads or short sequence patterns (i.e. exon/intron boundary read-through patterns) or expression profiles in large collections of sequences (i.e. a database). Searching the entirety of such databases mentioned above is usually only possible by searching the metadata or a set of results initially obtained from the experiment. Searching (approximately) for specific genomic sequence in all the data has not been possible in reasonable computational time. In this work we describe results of our new data structure, called binning directory that can distribute approximate search queries based on an extension of our recently introduced Interleaved Bloom Filters (IBF) called x-partitioned IBF (x-PIBF). The results presented here make use of Intel's Optane DC Persistent Memory architecture and achieves significant speedups compared to a disk based solution.oai:cds.cern.ch:26914352019
spellingShingle	other events or meetings Seiler, Enrico Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title	Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title_full	Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title_fullStr	Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title_full_unstemmed	Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title_short	Partitioned Interleaved Bloom filters using Optane DC Persistent Memory
title_sort	partitioned interleaved bloom filters using optane dc persistent memory
topic	other events or meetings
url	http://cds.cern.ch/record/2691435
work_keys_str_mv	AT seilerenrico partitionedinterleavedbloomfiltersusingoptanedcpersistentmemory AT seilerenrico ixpug2019annualconferenceatcern

Partitioned Interleaved Bloom filters using Optane DC Persistent Memory

Ejemplares similares