Cargando…

SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles

BACKGROUND: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used t...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Qing, Brysbaert, Marc
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880003/
https://www.ncbi.nlm.nih.gov/pubmed/20532192
http://dx.doi.org/10.1371/journal.pone.0010729
_version_ 1782181983036637184
author Cai, Qing
Brysbaert, Marc
author_facet Cai, Qing
Brysbaert, Marc
author_sort Cai, Qing
collection PubMed
description BACKGROUND: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. METHODOLOGY: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. CONCLUSIONS: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
format Text
id pubmed-2880003
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28800032010-06-07 SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles Cai, Qing Brysbaert, Marc PLoS One Research Article BACKGROUND: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to. METHODOLOGY: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. CONCLUSIONS: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes. Public Library of Science 2010-06-02 /pmc/articles/PMC2880003/ /pubmed/20532192 http://dx.doi.org/10.1371/journal.pone.0010729 Text en Cai, Brysbaert. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Cai, Qing
Brysbaert, Marc
SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title_full SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title_fullStr SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title_full_unstemmed SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title_short SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
title_sort subtlex-ch: chinese word and character frequencies based on film subtitles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880003/
https://www.ncbi.nlm.nih.gov/pubmed/20532192
http://dx.doi.org/10.1371/journal.pone.0010729
work_keys_str_mv AT caiqing subtlexchchinesewordandcharacterfrequenciesbasedonfilmsubtitles
AT brysbaertmarc subtlexchchinesewordandcharacterfrequenciesbasedonfilmsubtitles