Cargando…

On the physical origin of linguistic laws and lognormality in speech

Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in...

Descripción completa

Detalles Bibliográficos
Autores principales: Torre, Iván G., Luque, Bartolo, Lacasa, Lucas, Kello, Christopher T., Hernández-Fernández, Antoni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731709/
https://www.ncbi.nlm.nih.gov/pubmed/31598263
http://dx.doi.org/10.1098/rsos.191023
_version_ 1783449718456582144
author Torre, Iván G.
Luque, Bartolo
Lacasa, Lucas
Kello, Christopher T.
Hernández-Fernández, Antoni
author_facet Torre, Iván G.
Luque, Bartolo
Lacasa, Lucas
Kello, Christopher T.
Hernández-Fernández, Antoni
author_sort Torre, Iván G.
collection PubMed
description Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this ‘lognormality law’ using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf’s Law, Herdan’s Law, Brevity Law and Menzerath–Altmann’s Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan’s Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin.
format Online
Article
Text
id pubmed-6731709
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-67317092019-10-09 On the physical origin of linguistic laws and lognormality in speech Torre, Iván G. Luque, Bartolo Lacasa, Lucas Kello, Christopher T. Hernández-Fernández, Antoni R Soc Open Sci Physics Physical manifestations of linguistic units include sources of variability due to factors of speech production which are by definition excluded from counts of linguistic symbols. In this work, we examine whether linguistic laws hold with respect to the physical manifestations of linguistic units in spoken English. The data we analyse come from a phonetically transcribed database of acoustic recordings of spontaneous speech known as the Buckeye Speech corpus. First, we verify with unprecedented accuracy that acoustically transcribed durations of linguistic units at several scales comply with a lognormal distribution, and we quantitatively justify this ‘lognormality law’ using a stochastic generative model. Second, we explore the four classical linguistic laws (Zipf’s Law, Herdan’s Law, Brevity Law and Menzerath–Altmann’s Law (MAL)) in oral communication, both in physical units and in symbolic units measured in the speech transcriptions, and find that the validity of these laws is typically stronger when using physical units than in their symbolic counterpart. Additional results include (i) coining a Herdan’s Law in physical units, (ii) a precise mathematical formulation of Brevity Law, which we show to be connected to optimal compression principles in information theory and allows to formulate and validate yet another law which we call the size-rank law or (iii) a mathematical derivation of MAL which also highlights an additional regime where the law is inverted. Altogether, these results support the hypothesis that statistical laws in language have a physical origin. The Royal Society 2019-08-21 /pmc/articles/PMC6731709/ /pubmed/31598263 http://dx.doi.org/10.1098/rsos.191023 Text en © 2019 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Physics
Torre, Iván G.
Luque, Bartolo
Lacasa, Lucas
Kello, Christopher T.
Hernández-Fernández, Antoni
On the physical origin of linguistic laws and lognormality in speech
title On the physical origin of linguistic laws and lognormality in speech
title_full On the physical origin of linguistic laws and lognormality in speech
title_fullStr On the physical origin of linguistic laws and lognormality in speech
title_full_unstemmed On the physical origin of linguistic laws and lognormality in speech
title_short On the physical origin of linguistic laws and lognormality in speech
title_sort on the physical origin of linguistic laws and lognormality in speech
topic Physics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6731709/
https://www.ncbi.nlm.nih.gov/pubmed/31598263
http://dx.doi.org/10.1098/rsos.191023
work_keys_str_mv AT torreivang onthephysicaloriginoflinguisticlawsandlognormalityinspeech
AT luquebartolo onthephysicaloriginoflinguisticlawsandlognormalityinspeech
AT lacasalucas onthephysicaloriginoflinguisticlawsandlognormalityinspeech
AT kellochristophert onthephysicaloriginoflinguisticlawsandlognormalityinspeech
AT hernandezfernandezantoni onthephysicaloriginoflinguisticlawsandlognormalityinspeech