Cargando…
An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773183/ https://www.ncbi.nlm.nih.gov/pubmed/29346410 http://dx.doi.org/10.1371/journal.pone.0191175 |
_version_ | 1783293519552577536 |
---|---|
author | Cormode, Graham Dasgupta, Anirban Goyal, Amit Lee, Chi Hoon |
author_facet | Cormode, Graham Dasgupta, Anirban Goyal, Amit Lee, Chi Hoon |
author_sort | Cormode, Graham |
collection | PubMed |
description | Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space. |
format | Online Article Text |
id | pubmed-5773183 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-57731832018-01-26 An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs Cormode, Graham Dasgupta, Anirban Goyal, Amit Lee, Chi Hoon PLoS One Research Article Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space. Public Library of Science 2018-01-18 /pmc/articles/PMC5773183/ /pubmed/29346410 http://dx.doi.org/10.1371/journal.pone.0191175 Text en © 2018 Cormode et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Cormode, Graham Dasgupta, Anirban Goyal, Amit Lee, Chi Hoon An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title | An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title_full | An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title_fullStr | An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title_full_unstemmed | An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title_short | An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
title_sort | evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773183/ https://www.ncbi.nlm.nih.gov/pubmed/29346410 http://dx.doi.org/10.1371/journal.pone.0191175 |
work_keys_str_mv | AT cormodegraham anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT dasguptaanirban anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT goyalamit anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT leechihoon anevaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT cormodegraham evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT dasguptaanirban evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT goyalamit evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs AT leechihoon evaluationofmultiprobelocalitysensitivehashingforcomputingsimilaritiesoverwebscalequerylogs |