Cargando…

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous...

Descripción completa

Detalles Bibliográficos
Autores principales: Back, Andrew D., Wiles, Janet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8871184/
https://www.ncbi.nlm.nih.gov/pubmed/35205553
http://dx.doi.org/10.3390/e24020259
_version_ 1784656936147353600
author Back, Andrew D.
Wiles, Janet
author_facet Back, Andrew D.
Wiles, Janet
author_sort Back, Andrew D.
collection PubMed
description An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.
format Online
Article
Text
id pubmed-8871184
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88711842022-02-25 An Information Theoretic Approach to Symbolic Learning in Synthetic Languages Back, Andrew D. Wiles, Janet Entropy (Basel) Article An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent. MDPI 2022-02-10 /pmc/articles/PMC8871184/ /pubmed/35205553 http://dx.doi.org/10.3390/e24020259 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Back, Andrew D.
Wiles, Janet
An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title_full An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title_fullStr An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title_full_unstemmed An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title_short An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
title_sort information theoretic approach to symbolic learning in synthetic languages
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8871184/
https://www.ncbi.nlm.nih.gov/pubmed/35205553
http://dx.doi.org/10.3390/e24020259
work_keys_str_mv AT backandrewd aninformationtheoreticapproachtosymboliclearninginsyntheticlanguages
AT wilesjanet aninformationtheoreticapproachtosymboliclearninginsyntheticlanguages
AT backandrewd informationtheoreticapproachtosymboliclearninginsyntheticlanguages
AT wilesjanet informationtheoreticapproachtosymboliclearninginsyntheticlanguages