Cargando…

Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) object...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tan, Andrew K., Tegmark, Max, Chuang, Isaac L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9222302/ https://www.ncbi.nlm.nih.gov/pubmed/35741492 http://dx.doi.org/10.3390/e24060771

_version_	1784732840623079424
author	Tan, Andrew K. Tegmark, Max Chuang, Isaac L.
author_facet	Tan, Andrew K. Tegmark, Max Chuang, Isaac L.
author_sort	Tan, Andrew K.
collection	PubMed
description	At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.
format	Online Article Text
id	pubmed-9222302
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-92223022022-06-24 Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck Tan, Andrew K. Tegmark, Max Chuang, Isaac L. Entropy (Basel) Article At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull. MDPI 2022-05-30 /pmc/articles/PMC9222302/ /pubmed/35741492 http://dx.doi.org/10.3390/e24060771 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tan, Andrew K. Tegmark, Max Chuang, Isaac L. Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title	Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title_full	Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title_fullStr	Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title_full_unstemmed	Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title_short	Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck
title_sort	pareto-optimal clustering with the primal deterministic information bottleneck
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9222302/ https://www.ncbi.nlm.nih.gov/pubmed/35741492 http://dx.doi.org/10.3390/e24060771
work_keys_str_mv	AT tanandrewk paretooptimalclusteringwiththeprimaldeterministicinformationbottleneck AT tegmarkmax paretooptimalclusteringwiththeprimaldeterministicinformationbottleneck AT chuangisaacl paretooptimalclusteringwiththeprimaldeterministicinformationbottleneck

Pareto-Optimal Clustering with the Primal Deterministic Information Bottleneck

Ejemplares similares