Cargando…

Provable randomized rounding for minimum-similarity diversification

When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with...

Descripción completa

Detalles Bibliográficos
Autores principales: Ordozgoiti, Bruno, Mahadevan, Ananth, Matakos, Antonis, Gionis, Aristides
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964643/
https://www.ncbi.nlm.nih.gov/pubmed/35401029
http://dx.doi.org/10.1007/s10618-021-00811-2
_version_ 1784678262776004608
author Ordozgoiti, Bruno
Mahadevan, Ananth
Matakos, Antonis
Gionis, Aristides
author_facet Ordozgoiti, Bruno
Mahadevan, Ananth
Matakos, Antonis
Gionis, Aristides
author_sort Ordozgoiti, Bruno
collection PubMed
description When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.
format Online
Article
Text
id pubmed-8964643
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-89646432022-04-07 Provable randomized rounding for minimum-similarity diversification Ordozgoiti, Bruno Mahadevan, Ananth Matakos, Antonis Gionis, Aristides Data Min Knowl Discov Article When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature. Springer US 2022-01-04 2022 /pmc/articles/PMC8964643/ /pubmed/35401029 http://dx.doi.org/10.1007/s10618-021-00811-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ordozgoiti, Bruno
Mahadevan, Ananth
Matakos, Antonis
Gionis, Aristides
Provable randomized rounding for minimum-similarity diversification
title Provable randomized rounding for minimum-similarity diversification
title_full Provable randomized rounding for minimum-similarity diversification
title_fullStr Provable randomized rounding for minimum-similarity diversification
title_full_unstemmed Provable randomized rounding for minimum-similarity diversification
title_short Provable randomized rounding for minimum-similarity diversification
title_sort provable randomized rounding for minimum-similarity diversification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964643/
https://www.ncbi.nlm.nih.gov/pubmed/35401029
http://dx.doi.org/10.1007/s10618-021-00811-2
work_keys_str_mv AT ordozgoitibruno provablerandomizedroundingforminimumsimilaritydiversification
AT mahadevanananth provablerandomizedroundingforminimumsimilaritydiversification
AT matakosantonis provablerandomizedroundingforminimumsimilaritydiversification
AT gionisaristides provablerandomizedroundingforminimumsimilaritydiversification