Cargando…

An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs

Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing...

Descripción completa

Detalles Bibliográficos
Autores principales: Cormode, Graham, Dasgupta, Anirban, Goyal, Amit, Lee, Chi Hoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773183/
https://www.ncbi.nlm.nih.gov/pubmed/29346410
http://dx.doi.org/10.1371/journal.pone.0191175
_version_ 1783293519552577536
author Cormode, Graham
Dasgupta, Anirban
Goyal, Amit
Lee, Chi Hoon
author_facet Cormode, Graham
Dasgupta, Anirban
Goyal, Amit
Lee, Chi Hoon
author_sort Cormode, Graham
collection PubMed
description Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space.
format Online
Article
Text
id pubmed-5773183
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57731832018-01-26 An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs Cormode, Graham Dasgupta, Anirban Goyal, Amit Lee, Chi Hoon PLoS One Research Article Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space. Public Library of Science 2018-01-18 /pmc/articles/PMC5773183/ /pubmed/29346410 http://dx.doi.org/10.1371/journal.pone.0191175 Text en © 2018 Cormode et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cormode, Graham
Dasgupta, Anirban
Goyal, Amit
Lee, Chi Hoon
An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title_full An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title_fullStr An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title_full_unstemmed An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title_short An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
title_sort evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773183/
https://www.ncbi.nlm.nih.gov/pubmed/29346410
http://dx.doi.org/10.1371/journal.pone.0191175
work_keys_str_mv AT cormodegraham anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT dasguptaanirban anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT goyalamit anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT leechihoon anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT cormodegraham evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT dasguptaanirban evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT goyalamit evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs
AT leechihoon evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs