Cargando…
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of f...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431368/ https://www.ncbi.nlm.nih.gov/pubmed/26089861 http://dx.doi.org/10.1155/2015/217216 |
_version_ | 1782371335276593152 |
---|---|
author | Wang, Jingjing Lin, Chen |
author_facet | Wang, Jingjing Lin, Chen |
author_sort | Wang, Jingjing |
collection | PubMed |
description | Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods. |
format | Online Article Text |
id | pubmed-4431368 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-44313682015-06-18 MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data Wang, Jingjing Lin, Chen Comput Intell Neurosci Research Article Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods. Hindawi Publishing Corporation 2015 2015-04-30 /pmc/articles/PMC4431368/ /pubmed/26089861 http://dx.doi.org/10.1155/2015/217216 Text en Copyright © 2015 J. Wang and C. Lin. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Wang, Jingjing Lin, Chen MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title | MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_full | MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_fullStr | MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_full_unstemmed | MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_short | MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data |
title_sort | mapreduce based personalized locality sensitive hashing for similarity joins on large scale data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431368/ https://www.ncbi.nlm.nih.gov/pubmed/26089861 http://dx.doi.org/10.1155/2015/217216 |
work_keys_str_mv | AT wangjingjing mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata AT linchen mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata |