Cargando…

Human and computer estimations of Predictability of words in written language

When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In para...

Descripción completa

Detalles Bibliográficos
Autores principales: Bianchi, Bruno, Bengolea Monzón, Gastón, Ferrer, Luciana, Fernández Slezak, Diego, Shalom, Diego E., Kamienkowski, Juan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064512/
https://www.ncbi.nlm.nih.gov/pubmed/32157161
http://dx.doi.org/10.1038/s41598-020-61353-z
_version_ 1783504884910260224
author Bianchi, Bruno
Bengolea Monzón, Gastón
Ferrer, Luciana
Fernández Slezak, Diego
Shalom, Diego E.
Kamienkowski, Juan E.
author_facet Bianchi, Bruno
Bengolea Monzón, Gastón
Ferrer, Luciana
Fernández Slezak, Diego
Shalom, Diego E.
Kamienkowski, Juan E.
author_sort Bianchi, Bruno
collection PubMed
description When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms.
format Online
Article
Text
id pubmed-7064512
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-70645122020-03-18 Human and computer estimations of Predictability of words in written language Bianchi, Bruno Bengolea Monzón, Gastón Ferrer, Luciana Fernández Slezak, Diego Shalom, Diego E. Kamienkowski, Juan E. Sci Rep Article When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms. Nature Publishing Group UK 2020-03-10 /pmc/articles/PMC7064512/ /pubmed/32157161 http://dx.doi.org/10.1038/s41598-020-61353-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Bianchi, Bruno
Bengolea Monzón, Gastón
Ferrer, Luciana
Fernández Slezak, Diego
Shalom, Diego E.
Kamienkowski, Juan E.
Human and computer estimations of Predictability of words in written language
title Human and computer estimations of Predictability of words in written language
title_full Human and computer estimations of Predictability of words in written language
title_fullStr Human and computer estimations of Predictability of words in written language
title_full_unstemmed Human and computer estimations of Predictability of words in written language
title_short Human and computer estimations of Predictability of words in written language
title_sort human and computer estimations of predictability of words in written language
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064512/
https://www.ncbi.nlm.nih.gov/pubmed/32157161
http://dx.doi.org/10.1038/s41598-020-61353-z
work_keys_str_mv AT bianchibruno humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage
AT bengoleamonzongaston humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage
AT ferrerluciana humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage
AT fernandezslezakdiego humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage
AT shalomdiegoe humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage
AT kamienkowskijuane humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage