Cargando…

Suitability of a new Bloom filter for numerical vectors with high dimensions

The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom...

Descripción completa

Detalles Bibliográficos
Autores principales: Shuai, Chunyan, Lei, Jiayou, Gong, Zeweiyi, Ouyang, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303090/
https://www.ncbi.nlm.nih.gov/pubmed/30576359
http://dx.doi.org/10.1371/journal.pone.0209159
_version_ 1783382115288612864
author Shuai, Chunyan
Lei, Jiayou
Gong, Zeweiyi
Ouyang, Xin
author_facet Shuai, Chunyan
Lei, Jiayou
Gong, Zeweiyi
Ouyang, Xin
author_sort Shuai, Chunyan
collection PubMed
description The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom filter and most of the variants regard each element as a 1-dimensional string and adopt multiple different string hashes to project the data. The interesting problem is when the inputs are numerical vectors with high dimensions, it remains unknown whether they can be projected into the Bloom filter in their original format. Furthermore, we investigate whether the projection is random and uniform. To address these problems, this paper presents a new uniform Prime-HD-BKDERhash family and a new Bloom filter (P-HDBF) to retrieve the membership of a big data set with the numerical high dimensions. Since the randomness and uniformity of data mapping determines the performance of the Bloom filter, to verify these properties, we first introduce information entropy. Our theoretical and experimental results show that the P-HDBF can randomly and uniformly map the data in their native formats. Moreover, the P-HDBF provides an efficient solution alternative to implement membership search with space-time overheads. This advantage may be suitable for engineering applications that are resource-constrained or identification of the nuances of the graphics and images.
format Online
Article
Text
id pubmed-6303090
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63030902019-01-08 Suitability of a new Bloom filter for numerical vectors with high dimensions Shuai, Chunyan Lei, Jiayou Gong, Zeweiyi Ouyang, Xin PLoS One Research Article The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom filter and most of the variants regard each element as a 1-dimensional string and adopt multiple different string hashes to project the data. The interesting problem is when the inputs are numerical vectors with high dimensions, it remains unknown whether they can be projected into the Bloom filter in their original format. Furthermore, we investigate whether the projection is random and uniform. To address these problems, this paper presents a new uniform Prime-HD-BKDERhash family and a new Bloom filter (P-HDBF) to retrieve the membership of a big data set with the numerical high dimensions. Since the randomness and uniformity of data mapping determines the performance of the Bloom filter, to verify these properties, we first introduce information entropy. Our theoretical and experimental results show that the P-HDBF can randomly and uniformly map the data in their native formats. Moreover, the P-HDBF provides an efficient solution alternative to implement membership search with space-time overheads. This advantage may be suitable for engineering applications that are resource-constrained or identification of the nuances of the graphics and images. Public Library of Science 2018-12-21 /pmc/articles/PMC6303090/ /pubmed/30576359 http://dx.doi.org/10.1371/journal.pone.0209159 Text en © 2018 Shuai et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shuai, Chunyan
Lei, Jiayou
Gong, Zeweiyi
Ouyang, Xin
Suitability of a new Bloom filter for numerical vectors with high dimensions
title Suitability of a new Bloom filter for numerical vectors with high dimensions
title_full Suitability of a new Bloom filter for numerical vectors with high dimensions
title_fullStr Suitability of a new Bloom filter for numerical vectors with high dimensions
title_full_unstemmed Suitability of a new Bloom filter for numerical vectors with high dimensions
title_short Suitability of a new Bloom filter for numerical vectors with high dimensions
title_sort suitability of a new bloom filter for numerical vectors with high dimensions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303090/
https://www.ncbi.nlm.nih.gov/pubmed/30576359
http://dx.doi.org/10.1371/journal.pone.0209159
work_keys_str_mv AT shuaichunyan suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions
AT leijiayou suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions
AT gongzeweiyi suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions
AT ouyangxin suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions