Cargando…

Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences

Entropy estimation faces numerous challenges when applied to various real-world problems. Our interest is in divergence and entropy estimation algorithms which are capable of rapid estimation for natural sequence data such as human and synthetic languages. This typically requires a large amount of d...

Descripción completa

Detalles Bibliográficos
Autores principales: Back, Andrew D., Wiles, Janet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8468050/
https://www.ncbi.nlm.nih.gov/pubmed/34573725
http://dx.doi.org/10.3390/e23091100
_version_ 1784573561139101696
author Back, Andrew D.
Wiles, Janet
author_facet Back, Andrew D.
Wiles, Janet
author_sort Back, Andrew D.
collection PubMed
description Entropy estimation faces numerous challenges when applied to various real-world problems. Our interest is in divergence and entropy estimation algorithms which are capable of rapid estimation for natural sequence data such as human and synthetic languages. This typically requires a large amount of data; however, we propose a new approach which is based on a new rank-based analytic Zipf–Mandelbrot–Li probabilistic model. Unlike previous approaches, which do not consider the nature of the probability distribution in relation to language; here, we introduce a novel analytic Zipfian model which includes linguistic constraints. This provides more accurate distributions for natural sequences such as natural or synthetic emergent languages. Results are given which indicates the performance of the proposed ZML model. We derive an entropy estimation method which incorporates the linguistic constraint-based Zipf–Mandelbrot–Li into a new non-equiprobable coincidence counting algorithm which is shown to be effective for tasks such as entropy rate estimation with limited data.
format Online
Article
Text
id pubmed-8468050
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84680502021-09-27 Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences Back, Andrew D. Wiles, Janet Entropy (Basel) Article Entropy estimation faces numerous challenges when applied to various real-world problems. Our interest is in divergence and entropy estimation algorithms which are capable of rapid estimation for natural sequence data such as human and synthetic languages. This typically requires a large amount of data; however, we propose a new approach which is based on a new rank-based analytic Zipf–Mandelbrot–Li probabilistic model. Unlike previous approaches, which do not consider the nature of the probability distribution in relation to language; here, we introduce a novel analytic Zipfian model which includes linguistic constraints. This provides more accurate distributions for natural sequences such as natural or synthetic emergent languages. Results are given which indicates the performance of the proposed ZML model. We derive an entropy estimation method which incorporates the linguistic constraint-based Zipf–Mandelbrot–Li into a new non-equiprobable coincidence counting algorithm which is shown to be effective for tasks such as entropy rate estimation with limited data. MDPI 2021-08-24 /pmc/articles/PMC8468050/ /pubmed/34573725 http://dx.doi.org/10.3390/e23091100 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Back, Andrew D.
Wiles, Janet
Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title_full Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title_fullStr Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title_full_unstemmed Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title_short Entropy Estimation Using a Linguistic Zipf–Mandelbrot–Li Model for Natural Sequences
title_sort entropy estimation using a linguistic zipf–mandelbrot–li model for natural sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8468050/
https://www.ncbi.nlm.nih.gov/pubmed/34573725
http://dx.doi.org/10.3390/e23091100
work_keys_str_mv AT backandrewd entropyestimationusingalinguisticzipfmandelbrotlimodelfornaturalsequences
AT wilesjanet entropyestimationusingalinguisticzipfmandelbrotlimodelfornaturalsequences