Cargando…
The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on natur...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549930/ https://www.ncbi.nlm.nih.gov/pubmed/34720781 http://dx.doi.org/10.1007/s10579-020-09503-7 |
_version_ | 1784590855747665920 |
---|---|
author | Futrell, Richard Gibson, Edward Tily, Harry J. Blank, Idan Vishnevetsky, Anastasia Piantadosi, Steven T. Fedorenko, Evelina |
author_facet | Futrell, Richard Gibson, Edward Tily, Harry J. Blank, Idan Vishnevetsky, Anastasia Piantadosi, Steven T. Fedorenko, Evelina |
author_sort | Futrell, Richard |
collection | PubMed |
description | It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data. |
format | Online Article Text |
id | pubmed-8549930 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-85499302021-10-29 The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions Futrell, Richard Gibson, Edward Tily, Harry J. Blank, Idan Vishnevetsky, Anastasia Piantadosi, Steven T. Fedorenko, Evelina Lang Resour Eval Original Paper It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data. Springer Netherlands 2020-09-04 2021 /pmc/articles/PMC8549930/ /pubmed/34720781 http://dx.doi.org/10.1007/s10579-020-09503-7 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Paper Futrell, Richard Gibson, Edward Tily, Harry J. Blank, Idan Vishnevetsky, Anastasia Piantadosi, Steven T. Fedorenko, Evelina The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title | The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title_full | The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title_fullStr | The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title_full_unstemmed | The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title_short | The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions |
title_sort | natural stories corpus: a reading-time corpus of english texts containing rare syntactic constructions |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549930/ https://www.ncbi.nlm.nih.gov/pubmed/34720781 http://dx.doi.org/10.1007/s10579-020-09503-7 |
work_keys_str_mv | AT futrellrichard thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT gibsonedward thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT tilyharryj thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT blankidan thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT vishnevetskyanastasia thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT piantadosistevent thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT fedorenkoevelina thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT futrellrichard naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT gibsonedward naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT tilyharryj naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT blankidan naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT vishnevetskyanastasia naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT piantadosistevent naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions AT fedorenkoevelina naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions |