Cargando…
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese
This article presents RastrOS, a new eye-tracking corpus of eye movement data from university students during silent reading of paragraphs of texts in Brazilian Portuguese (BP). The article shows the potential of the corpus for natural language processing (NLP) using it to evaluate the sentence comp...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383681/ https://www.ncbi.nlm.nih.gov/pubmed/35990365 http://dx.doi.org/10.1007/s10579-022-09609-0 |
_version_ | 1784769409854734336 |
---|---|
author | Leal, Sidney Evaldo Lukasova, Katerina Carthery-Goulart, Maria Teresa Aluísio, Sandra Maria |
author_facet | Leal, Sidney Evaldo Lukasova, Katerina Carthery-Goulart, Maria Teresa Aluísio, Sandra Maria |
author_sort | Leal, Sidney Evaldo |
collection | PubMed |
description | This article presents RastrOS, a new eye-tracking corpus of eye movement data from university students during silent reading of paragraphs of texts in Brazilian Portuguese (BP). The article shows the potential of the corpus for natural language processing (NLP) using it to evaluate the sentence complexity prediction task in BP and it also focuses on the description of NLP resources and methods developed to create the corpus. Specifically, we present: (i) the method used to select the corpus paragraphs from large corpora, using linguistic metrics and clustering algorithms; (ii) the platform for collecting the Cloze test, which is also responsible for creating the project datasets, and (iii) the hybrid semantic similarity method, based on word embedding models and contextualised word representations, used to generate semantic predictability norms. RastrOS can be downloaded from the open science framework repository with the computational infrastructure mentioned above. Datasets with predictability norms of 393 participants and eye-tracking data of 37 participants are available in the OSF repository for this work (https://osf.io/9jxg3/). |
format | Online Article Text |
id | pubmed-9383681 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-93836812022-08-17 RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese Leal, Sidney Evaldo Lukasova, Katerina Carthery-Goulart, Maria Teresa Aluísio, Sandra Maria Lang Resour Eval Project Notes This article presents RastrOS, a new eye-tracking corpus of eye movement data from university students during silent reading of paragraphs of texts in Brazilian Portuguese (BP). The article shows the potential of the corpus for natural language processing (NLP) using it to evaluate the sentence complexity prediction task in BP and it also focuses on the description of NLP resources and methods developed to create the corpus. Specifically, we present: (i) the method used to select the corpus paragraphs from large corpora, using linguistic metrics and clustering algorithms; (ii) the platform for collecting the Cloze test, which is also responsible for creating the project datasets, and (iii) the hybrid semantic similarity method, based on word embedding models and contextualised word representations, used to generate semantic predictability norms. RastrOS can be downloaded from the open science framework repository with the computational infrastructure mentioned above. Datasets with predictability norms of 393 participants and eye-tracking data of 37 participants are available in the OSF repository for this work (https://osf.io/9jxg3/). Springer Netherlands 2022-08-17 2022 /pmc/articles/PMC9383681/ /pubmed/35990365 http://dx.doi.org/10.1007/s10579-022-09609-0 Text en © The Author(s), under exclusive licence to Springer Nature B.V. 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Project Notes Leal, Sidney Evaldo Lukasova, Katerina Carthery-Goulart, Maria Teresa Aluísio, Sandra Maria RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title | RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title_full | RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title_fullStr | RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title_full_unstemmed | RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title_short | RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese |
title_sort | rastros project: natural language processing contributions to the development of an eye-tracking corpus with predictability norms for brazilian portuguese |
topic | Project Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383681/ https://www.ncbi.nlm.nih.gov/pubmed/35990365 http://dx.doi.org/10.1007/s10579-022-09609-0 |
work_keys_str_mv | AT lealsidneyevaldo rastrosprojectnaturallanguageprocessingcontributionstothedevelopmentofaneyetrackingcorpuswithpredictabilitynormsforbrazilianportuguese AT lukasovakaterina rastrosprojectnaturallanguageprocessingcontributionstothedevelopmentofaneyetrackingcorpuswithpredictabilitynormsforbrazilianportuguese AT cartherygoulartmariateresa rastrosprojectnaturallanguageprocessingcontributionstothedevelopmentofaneyetrackingcorpuswithpredictabilitynormsforbrazilianportuguese AT aluisiosandramaria rastrosprojectnaturallanguageprocessingcontributionstothedevelopmentofaneyetrackingcorpuswithpredictabilitynormsforbrazilianportuguese |