Cargando…

Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing

In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States a...

Descripción completa

Detalles Bibliográficos
Autores principales: Brysbaert, Marc, Keuleers, Emmanuel, New, Boris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111095/
https://www.ncbi.nlm.nih.gov/pubmed/21713191
http://dx.doi.org/10.3389/fpsyg.2011.00027
_version_ 1782205579007098880
author Brysbaert, Marc
Keuleers, Emmanuel
New, Boris
author_facet Brysbaert, Marc
Keuleers, Emmanuel
New, Boris
author_sort Brysbaert, Marc
collection PubMed
description In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies.
format Online
Article
Text
id pubmed-3111095
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-31110952011-06-27 Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing Brysbaert, Marc Keuleers, Emmanuel New, Boris Front Psychol Psychology In this Perspective Article we assess the usefulness of Google's new word frequencies for word recognition research (lexical decision and word naming). We find that, despite the massive corpus on which the Google estimates are based (131 billion words from books published in the United States alone), the Google American English frequencies explain 11% less of the variance in the lexical decision times from the English Lexicon Project (Balota et al., 2007) than the SUBTLEX-US word frequencies, based on a corpus of 51 million words from film and television subtitles. Further analyses indicate that word frequencies derived from recent books (published after 2000) are better predictors of word processing times than frequencies based on the full corpus, and that word frequencies based on fiction books predict word processing times better than word frequencies based on the full corpus. The most predictive word frequencies from Google still do not explain more of the variance in word recognition times of undergraduate students and old adults than the subtitle-based word frequencies. Frontiers Research Foundation 2011-03-02 /pmc/articles/PMC3111095/ /pubmed/21713191 http://dx.doi.org/10.3389/fpsyg.2011.00027 Text en Copyright © 2011 Brysbaert, Keuleers and New. http://www.frontiersin.org/licenseagreement This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
spellingShingle Psychology
Brysbaert, Marc
Keuleers, Emmanuel
New, Boris
Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title_full Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title_fullStr Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title_full_unstemmed Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title_short Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing
title_sort assessing the usefulness of google books’ word frequencies for psycholinguistic research on word processing
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111095/
https://www.ncbi.nlm.nih.gov/pubmed/21713191
http://dx.doi.org/10.3389/fpsyg.2011.00027
work_keys_str_mv AT brysbaertmarc assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing
AT keuleersemmanuel assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing
AT newboris assessingtheusefulnessofgooglebookswordfrequenciesforpsycholinguisticresearchonwordprocessing