Cargando…

Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †

Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of [Formula: see text] elements with maximum diversity, as quantified by the dissimilarities among the elem...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yanhao, Fabbri, Francesco, Mathioudakis, Michael, Li, Jia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378839/
https://www.ncbi.nlm.nih.gov/pubmed/37510013
http://dx.doi.org/10.3390/e25071066
_version_ 1785079865615056896
author Wang, Yanhao
Fabbri, Francesco
Mathioudakis, Michael
Li, Jia
author_facet Wang, Yanhao
Fabbri, Francesco
Mathioudakis, Michael
Li, Jia
author_sort Wang, Yanhao
collection PubMed
description Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of [Formula: see text] elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max–min diversity maximization problem, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains [Formula: see text] elements from each group [Formula: see text]. Although diversity maximization has been extensively studied, existing algorithms for fair max–min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest w elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings.
format Online
Article
Text
id pubmed-10378839
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103788392023-07-29 Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models † Wang, Yanhao Fabbri, Francesco Mathioudakis, Michael Li, Jia Entropy (Basel) Article Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of [Formula: see text] elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max–min diversity maximization problem, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains [Formula: see text] elements from each group [Formula: see text]. Although diversity maximization has been extensively studied, existing algorithms for fair max–min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest w elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings. MDPI 2023-07-14 /pmc/articles/PMC10378839/ /pubmed/37510013 http://dx.doi.org/10.3390/e25071066 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Yanhao
Fabbri, Francesco
Mathioudakis, Michael
Li, Jia
Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title_full Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title_fullStr Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title_full_unstemmed Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title_short Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
title_sort fair max–min diversity maximization in streaming and sliding-window models †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378839/
https://www.ncbi.nlm.nih.gov/pubmed/37510013
http://dx.doi.org/10.3390/e25071066
work_keys_str_mv AT wangyanhao fairmaxmindiversitymaximizationinstreamingandslidingwindowmodels
AT fabbrifrancesco fairmaxmindiversitymaximizationinstreamingandslidingwindowmodels
AT mathioudakismichael fairmaxmindiversitymaximizationinstreamingandslidingwindowmodels
AT lijia fairmaxmindiversitymaximizationinstreamingandslidingwindowmodels