Cargando…

A large-scaled corpus for assessing text readability

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt’s year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse proces...

Descripción completa

Detalles Bibliográficos
Autores principales: Crossley, Scott, Heintz, Aron, Choi, Joon Suh, Batchelor, Jordan, Karimi, Mehrnoush, Malatinszky, Agnes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027808/
https://www.ncbi.nlm.nih.gov/pubmed/35297016
http://dx.doi.org/10.3758/s13428-022-01802-x
_version_ 1784909794222538752
author Crossley, Scott
Heintz, Aron
Choi, Joon Suh
Batchelor, Jordan
Karimi, Mehrnoush
Malatinszky, Agnes
author_facet Crossley, Scott
Heintz, Aron
Choi, Joon Suh
Batchelor, Jordan
Karimi, Mehrnoush
Malatinszky, Agnes
author_sort Crossley, Scott
collection PubMed
description This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt’s year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers’ ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.
format Online
Article
Text
id pubmed-10027808
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-100278082023-03-22 A large-scaled corpus for assessing text readability Crossley, Scott Heintz, Aron Choi, Joon Suh Batchelor, Jordan Karimi, Mehrnoush Malatinszky, Agnes Behav Res Methods Article This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt’s year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers’ ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability. Springer US 2022-03-16 2023 /pmc/articles/PMC10027808/ /pubmed/35297016 http://dx.doi.org/10.3758/s13428-022-01802-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Crossley, Scott
Heintz, Aron
Choi, Joon Suh
Batchelor, Jordan
Karimi, Mehrnoush
Malatinszky, Agnes
A large-scaled corpus for assessing text readability
title A large-scaled corpus for assessing text readability
title_full A large-scaled corpus for assessing text readability
title_fullStr A large-scaled corpus for assessing text readability
title_full_unstemmed A large-scaled corpus for assessing text readability
title_short A large-scaled corpus for assessing text readability
title_sort large-scaled corpus for assessing text readability
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027808/
https://www.ncbi.nlm.nih.gov/pubmed/35297016
http://dx.doi.org/10.3758/s13428-022-01802-x
work_keys_str_mv AT crossleyscott alargescaledcorpusforassessingtextreadability
AT heintzaron alargescaledcorpusforassessingtextreadability
AT choijoonsuh alargescaledcorpusforassessingtextreadability
AT batchelorjordan alargescaledcorpusforassessingtextreadability
AT karimimehrnoush alargescaledcorpusforassessingtextreadability
AT malatinszkyagnes alargescaledcorpusforassessingtextreadability
AT crossleyscott largescaledcorpusforassessingtextreadability
AT heintzaron largescaledcorpusforassessingtextreadability
AT choijoonsuh largescaledcorpusforassessingtextreadability
AT batchelorjordan largescaledcorpusforassessingtextreadability
AT karimimehrnoush largescaledcorpusforassessingtextreadability
AT malatinszkyagnes largescaledcorpusforassessingtextreadability