Cargando…

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of f...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jingjing, Lin, Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431368/
https://www.ncbi.nlm.nih.gov/pubmed/26089861
http://dx.doi.org/10.1155/2015/217216
_version_ 1782371335276593152
author Wang, Jingjing
Lin, Chen
author_facet Wang, Jingjing
Lin, Chen
author_sort Wang, Jingjing
collection PubMed
description Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.
format Online
Article
Text
id pubmed-4431368
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-44313682015-06-18 MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data Wang, Jingjing Lin, Chen Comput Intell Neurosci Research Article Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods. Hindawi Publishing Corporation 2015 2015-04-30 /pmc/articles/PMC4431368/ /pubmed/26089861 http://dx.doi.org/10.1155/2015/217216 Text en Copyright © 2015 J. Wang and C. Lin. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wang, Jingjing
Lin, Chen
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_full MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_fullStr MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_full_unstemmed MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_short MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
title_sort mapreduce based personalized locality sensitive hashing for similarity joins on large scale data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431368/
https://www.ncbi.nlm.nih.gov/pubmed/26089861
http://dx.doi.org/10.1155/2015/217216
work_keys_str_mv AT wangjingjing mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata
AT linchen mapreducebasedpersonalizedlocalitysensitivehashingforsimilarityjoinsonlargescaledata