Cargando…
Suitability of a new Bloom filter for numerical vectors with high dimensions
The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303090/ https://www.ncbi.nlm.nih.gov/pubmed/30576359 http://dx.doi.org/10.1371/journal.pone.0209159 |
_version_ | 1783382115288612864 |
---|---|
author | Shuai, Chunyan Lei, Jiayou Gong, Zeweiyi Ouyang, Xin |
author_facet | Shuai, Chunyan Lei, Jiayou Gong, Zeweiyi Ouyang, Xin |
author_sort | Shuai, Chunyan |
collection | PubMed |
description | The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom filter and most of the variants regard each element as a 1-dimensional string and adopt multiple different string hashes to project the data. The interesting problem is when the inputs are numerical vectors with high dimensions, it remains unknown whether they can be projected into the Bloom filter in their original format. Furthermore, we investigate whether the projection is random and uniform. To address these problems, this paper presents a new uniform Prime-HD-BKDERhash family and a new Bloom filter (P-HDBF) to retrieve the membership of a big data set with the numerical high dimensions. Since the randomness and uniformity of data mapping determines the performance of the Bloom filter, to verify these properties, we first introduce information entropy. Our theoretical and experimental results show that the P-HDBF can randomly and uniformly map the data in their native formats. Moreover, the P-HDBF provides an efficient solution alternative to implement membership search with space-time overheads. This advantage may be suitable for engineering applications that are resource-constrained or identification of the nuances of the graphics and images. |
format | Online Article Text |
id | pubmed-6303090 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-63030902019-01-08 Suitability of a new Bloom filter for numerical vectors with high dimensions Shuai, Chunyan Lei, Jiayou Gong, Zeweiyi Ouyang, Xin PLoS One Research Article The notable increase in the size and dimensions of data have presented challenges for data storage and retrieval. The Bloom filter and its generations, due to efficient space overheads and constant query delays, have been broadly applied to querying memberships of a big data set. However, the Bloom filter and most of the variants regard each element as a 1-dimensional string and adopt multiple different string hashes to project the data. The interesting problem is when the inputs are numerical vectors with high dimensions, it remains unknown whether they can be projected into the Bloom filter in their original format. Furthermore, we investigate whether the projection is random and uniform. To address these problems, this paper presents a new uniform Prime-HD-BKDERhash family and a new Bloom filter (P-HDBF) to retrieve the membership of a big data set with the numerical high dimensions. Since the randomness and uniformity of data mapping determines the performance of the Bloom filter, to verify these properties, we first introduce information entropy. Our theoretical and experimental results show that the P-HDBF can randomly and uniformly map the data in their native formats. Moreover, the P-HDBF provides an efficient solution alternative to implement membership search with space-time overheads. This advantage may be suitable for engineering applications that are resource-constrained or identification of the nuances of the graphics and images. Public Library of Science 2018-12-21 /pmc/articles/PMC6303090/ /pubmed/30576359 http://dx.doi.org/10.1371/journal.pone.0209159 Text en © 2018 Shuai et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Shuai, Chunyan Lei, Jiayou Gong, Zeweiyi Ouyang, Xin Suitability of a new Bloom filter for numerical vectors with high dimensions |
title | Suitability of a new Bloom filter for numerical vectors with high dimensions |
title_full | Suitability of a new Bloom filter for numerical vectors with high dimensions |
title_fullStr | Suitability of a new Bloom filter for numerical vectors with high dimensions |
title_full_unstemmed | Suitability of a new Bloom filter for numerical vectors with high dimensions |
title_short | Suitability of a new Bloom filter for numerical vectors with high dimensions |
title_sort | suitability of a new bloom filter for numerical vectors with high dimensions |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303090/ https://www.ncbi.nlm.nih.gov/pubmed/30576359 http://dx.doi.org/10.1371/journal.pone.0209159 |
work_keys_str_mv | AT shuaichunyan suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions AT leijiayou suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions AT gongzeweiyi suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions AT ouyangxin suitabilityofanewbloomfilterfornumericalvectorswithhighdimensions |