Cargando…

Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk

The entropy rate h of a natural language quantifies the complexity underlying the language. While recent studies have used computational approaches to estimate this rate, their results rely fundamentally on the performance of the language model used for prediction. On the other hand, in 1951, Shanno...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Geng, Takahashi, Shuntaro, Tanaka-Ishii, Kumiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514546/
http://dx.doi.org/10.3390/e21121201
_version_ 1783586612865662976
author Ren, Geng
Takahashi, Shuntaro
Tanaka-Ishii, Kumiko
author_facet Ren, Geng
Takahashi, Shuntaro
Tanaka-Ishii, Kumiko
author_sort Ren, Geng
collection PubMed
description The entropy rate h of a natural language quantifies the complexity underlying the language. While recent studies have used computational approaches to estimate this rate, their results rely fundamentally on the performance of the language model used for prediction. On the other hand, in 1951, Shannon conducted a cognitive experiment to estimate the rate without the use of any such artifact. Shannon’s experiment, however, used only one subject, bringing into question the statistical validity of his value of [Formula: see text] bits per character for the English language entropy rate. In this study, we conducted Shannon’s experiment on a much larger scale to reevaluate the entropy rate h via Amazon’s Mechanical Turk, a crowd-sourcing service. The online subjects recruited through Mechanical Turk were each asked to guess the succeeding character after being given the preceding characters until obtaining the correct answer. We collected 172,954 character predictions and analyzed these predictions with a bootstrap technique. The analysis suggests that a large number of character predictions per context length, perhaps as many as [Formula: see text] , would be necessary to obtain a convergent estimate of the entropy rate, and if fewer predictions are used, the resulting h value may be underestimated. Our final entropy estimate was [Formula: see text] bits per character.
format Online
Article
Text
id pubmed-7514546
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75145462020-11-09 Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk Ren, Geng Takahashi, Shuntaro Tanaka-Ishii, Kumiko Entropy (Basel) Article The entropy rate h of a natural language quantifies the complexity underlying the language. While recent studies have used computational approaches to estimate this rate, their results rely fundamentally on the performance of the language model used for prediction. On the other hand, in 1951, Shannon conducted a cognitive experiment to estimate the rate without the use of any such artifact. Shannon’s experiment, however, used only one subject, bringing into question the statistical validity of his value of [Formula: see text] bits per character for the English language entropy rate. In this study, we conducted Shannon’s experiment on a much larger scale to reevaluate the entropy rate h via Amazon’s Mechanical Turk, a crowd-sourcing service. The online subjects recruited through Mechanical Turk were each asked to guess the succeeding character after being given the preceding characters until obtaining the correct answer. We collected 172,954 character predictions and analyzed these predictions with a bootstrap technique. The analysis suggests that a large number of character predictions per context length, perhaps as many as [Formula: see text] , would be necessary to obtain a convergent estimate of the entropy rate, and if fewer predictions are used, the resulting h value may be underestimated. Our final entropy estimate was [Formula: see text] bits per character. MDPI 2019-12-06 /pmc/articles/PMC7514546/ http://dx.doi.org/10.3390/e21121201 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ren, Geng
Takahashi, Shuntaro
Tanaka-Ishii, Kumiko
Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title_full Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title_fullStr Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title_full_unstemmed Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title_short Entropy Rate Estimation for English via a Large Cognitive Experiment Using Mechanical Turk
title_sort entropy rate estimation for english via a large cognitive experiment using mechanical turk
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514546/
http://dx.doi.org/10.3390/e21121201
work_keys_str_mv AT rengeng entropyrateestimationforenglishviaalargecognitiveexperimentusingmechanicalturk
AT takahashishuntaro entropyrateestimationforenglishviaalargecognitiveexperimentusingmechanicalturk
AT tanakaishiikumiko entropyrateestimationforenglishviaalargecognitiveexperimentusingmechanicalturk