Cargando…
Human and computer estimations of Predictability of words in written language
When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In para...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064512/ https://www.ncbi.nlm.nih.gov/pubmed/32157161 http://dx.doi.org/10.1038/s41598-020-61353-z |
_version_ | 1783504884910260224 |
---|---|
author | Bianchi, Bruno Bengolea Monzón, Gastón Ferrer, Luciana Fernández Slezak, Diego Shalom, Diego E. Kamienkowski, Juan E. |
author_facet | Bianchi, Bruno Bengolea Monzón, Gastón Ferrer, Luciana Fernández Slezak, Diego Shalom, Diego E. Kamienkowski, Juan E. |
author_sort | Bianchi, Bruno |
collection | PubMed |
description | When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms. |
format | Online Article Text |
id | pubmed-7064512 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-70645122020-03-18 Human and computer estimations of Predictability of words in written language Bianchi, Bruno Bengolea Monzón, Gastón Ferrer, Luciana Fernández Slezak, Diego Shalom, Diego E. Kamienkowski, Juan E. Sci Rep Article When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field evolved by developing a wide variety of applications. Here, we show that using different word embeddings techniques (like Latent Semantic Analysis, Word2Vec, and FastText) and N-gram-based language models we were able to estimate how humans predict words (cloze-task Predictability) and how to better understand eye movements in long Spanish texts. Both types of models partially captured aspects of predictability. On the one hand, our N-gram model performed well when added as a replacement for the cloze-task Predictability of the fixated word. On the other hand, word embeddings were useful to mimic Predictability of the following word. Our study joins efforts from neurolinguistic and NLP fields to understand human information processing during reading to potentially improve NLP algorithms. Nature Publishing Group UK 2020-03-10 /pmc/articles/PMC7064512/ /pubmed/32157161 http://dx.doi.org/10.1038/s41598-020-61353-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Bianchi, Bruno Bengolea Monzón, Gastón Ferrer, Luciana Fernández Slezak, Diego Shalom, Diego E. Kamienkowski, Juan E. Human and computer estimations of Predictability of words in written language |
title | Human and computer estimations of Predictability of words in written language |
title_full | Human and computer estimations of Predictability of words in written language |
title_fullStr | Human and computer estimations of Predictability of words in written language |
title_full_unstemmed | Human and computer estimations of Predictability of words in written language |
title_short | Human and computer estimations of Predictability of words in written language |
title_sort | human and computer estimations of predictability of words in written language |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7064512/ https://www.ncbi.nlm.nih.gov/pubmed/32157161 http://dx.doi.org/10.1038/s41598-020-61353-z |
work_keys_str_mv | AT bianchibruno humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage AT bengoleamonzongaston humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage AT ferrerluciana humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage AT fernandezslezakdiego humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage AT shalomdiegoe humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage AT kamienkowskijuane humanandcomputerestimationsofpredictabilityofwordsinwrittenlanguage |