Cargando…

The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions

It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on natur...

Descripción completa

Detalles Bibliográficos
Autores principales: Futrell, Richard, Gibson, Edward, Tily, Harry J., Blank, Idan, Vishnevetsky, Anastasia, Piantadosi, Steven T., Fedorenko, Evelina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549930/
https://www.ncbi.nlm.nih.gov/pubmed/34720781
http://dx.doi.org/10.1007/s10579-020-09503-7
_version_ 1784590855747665920
author Futrell, Richard
Gibson, Edward
Tily, Harry J.
Blank, Idan
Vishnevetsky, Anastasia
Piantadosi, Steven T.
Fedorenko, Evelina
author_facet Futrell, Richard
Gibson, Edward
Tily, Harry J.
Blank, Idan
Vishnevetsky, Anastasia
Piantadosi, Steven T.
Fedorenko, Evelina
author_sort Futrell, Richard
collection PubMed
description It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data.
format Online
Article
Text
id pubmed-8549930
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-85499302021-10-29 The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions Futrell, Richard Gibson, Edward Tily, Harry J. Blank, Idan Vishnevetsky, Anastasia Piantadosi, Steven T. Fedorenko, Evelina Lang Resour Eval Original Paper It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data. Springer Netherlands 2020-09-04 2021 /pmc/articles/PMC8549930/ /pubmed/34720781 http://dx.doi.org/10.1007/s10579-020-09503-7 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Paper
Futrell, Richard
Gibson, Edward
Tily, Harry J.
Blank, Idan
Vishnevetsky, Anastasia
Piantadosi, Steven T.
Fedorenko, Evelina
The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title_full The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title_fullStr The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title_full_unstemmed The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title_short The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions
title_sort natural stories corpus: a reading-time corpus of english texts containing rare syntactic constructions
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549930/
https://www.ncbi.nlm.nih.gov/pubmed/34720781
http://dx.doi.org/10.1007/s10579-020-09503-7
work_keys_str_mv AT futrellrichard thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT gibsonedward thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT tilyharryj thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT blankidan thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT vishnevetskyanastasia thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT piantadosistevent thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT fedorenkoevelina thenaturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT futrellrichard naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT gibsonedward naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT tilyharryj naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT blankidan naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT vishnevetskyanastasia naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT piantadosistevent naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions
AT fedorenkoevelina naturalstoriescorpusareadingtimecorpusofenglishtextscontainingraresyntacticconstructions