Cargando…

Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is neede...

Descripción completa

Detalles Bibliográficos
Autor principal: Dębowski, Łukasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512648/
https://www.ncbi.nlm.nih.gov/pubmed/33265176
http://dx.doi.org/10.3390/e20020085
_version_ 1783586206743789568
author Dębowski, Łukasz
author_facet Dębowski, Łukasz
author_sort Dębowski, Łukasz
collection PubMed
description As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the Prediction by Partial Matching (PPM) compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence, we suppose that natural language considered as a process is not only non-Markov but also perigraphic.
format Online
Article
Text
id pubmed-7512648
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75126482020-11-09 Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited Dębowski, Łukasz Entropy (Basel) Article As we discuss, a stationary stochastic process is nonergodic when a random persistent topic can be detected in the infinite random text sampled from the process, whereas we call the process strongly nonergodic when an infinite sequence of independent random bits, called probabilistic facts, is needed to describe this topic completely. Replacing probabilistic facts with an algorithmically random sequence of bits, called algorithmic facts, we adapt this property back to ergodic processes. Subsequently, we call a process perigraphic if the number of algorithmic facts which can be inferred from a finite text sampled from the process grows like a power of the text length. We present a simple example of such a process. Moreover, we demonstrate an assertion which we call the theorem about facts and words. This proposition states that the number of probabilistic or algorithmic facts which can be inferred from a text drawn from a process must be roughly smaller than the number of distinct word-like strings detected in this text by means of the Prediction by Partial Matching (PPM) compression algorithm. We also observe that the number of the word-like strings for a sample of plays by Shakespeare follows an empirical stepwise power law, in a stark contrast to Markov processes. Hence, we suppose that natural language considered as a process is not only non-Markov but also perigraphic. MDPI 2018-01-26 /pmc/articles/PMC7512648/ /pubmed/33265176 http://dx.doi.org/10.3390/e20020085 Text en © 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dębowski, Łukasz
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title_full Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title_fullStr Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title_full_unstemmed Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title_short Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited
title_sort is natural language a perigraphic process? the theorem about facts and words revisited
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512648/
https://www.ncbi.nlm.nih.gov/pubmed/33265176
http://dx.doi.org/10.3390/e20020085
work_keys_str_mv AT debowskiłukasz isnaturallanguageaperigraphicprocessthetheoremaboutfactsandwordsrevisited