Cargando…

Communication Efficient Algorithms for Bounding and Approximating the Empirical Entropy in Distributed Systems

The empirical entropy is a key statistical measure of data frequency vectors, enabling one to estimate how diverse the data are. From the computational point of view, it is important to quickly compute, approximate, or bound the entropy. In a distributed system, the representative (“global”) frequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Shahar, Amit, Alfassi, Yuval, Keren, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9689552/
https://www.ncbi.nlm.nih.gov/pubmed/36359705
http://dx.doi.org/10.3390/e24111611
Descripción
Sumario:The empirical entropy is a key statistical measure of data frequency vectors, enabling one to estimate how diverse the data are. From the computational point of view, it is important to quickly compute, approximate, or bound the entropy. In a distributed system, the representative (“global”) frequency vector is the average of the “local” frequency vectors, each residing in a distinct node. Typically, the trivial solution of aggregating the local vectors and computing their average incurs a huge communication overhead. Hence, the challenge is to approximate, or bound, the entropy of the global vector, while reducing communication overhead. In this paper, we develop algorithms which achieve this goal.