Cargando…

Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms

The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool’s massive corpus of data (about 8 million books or 6% of all books ever publ...

Descripción completa

Detalles Bibliográficos
Autores principales: Younes, Nadja, Reips, Ulf-Dietrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6430395/
https://www.ncbi.nlm.nih.gov/pubmed/30901329
http://dx.doi.org/10.1371/journal.pone.0213554
_version_ 1783405764880105472
author Younes, Nadja
Reips, Ulf-Dietrich
author_facet Younes, Nadja
Reips, Ulf-Dietrich
author_sort Younes, Nadja
collection PubMed
description The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool’s massive corpus of data (about 8 million books or 6% of all books ever published) has been used in various scientific studies, concerns about the accuracy of results have simultaneously emerged. This paper reviews the literature and serves as a guideline for improving Google Ngram studies by suggesting five methodological procedures suited to increase the reliability of results. In particular, we recommend the use of (I) different language corpora, (II) cross-checks on different corpora from the same language, (III) word inflections, (IV) synonyms, and (V) a standardization procedure that accounts for both the influx of data and unequal weights of word frequencies. Further, we outline how to combine these procedures and address the risk of potential biases arising from censorship and propaganda. As an example of the proposed procedures, we examine the cross-cultural expression of religion via religious terms for the years 1900 to 2000. Special emphasis is placed on the situation during World War II. In line with the strand of literature that emphasizes the decline of collectivistic values, our results suggest an overall decrease of religion’s importance. However, religion re-gains importance during times of crisis such as World War II. By comparing the results obtained through the different methods, we illustrate that applying and particularly combining our suggested procedures increase the reliability of results and prevents authors from deriving wrong assumptions.
format Online
Article
Text
id pubmed-6430395
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64303952019-04-01 Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms Younes, Nadja Reips, Ulf-Dietrich PLoS One Research Article The Google Books Ngram Viewer (Google Ngram) is a search engine that charts word frequencies from a large corpus of books and thereby allows for the examination of cultural change as it is reflected in books. While the tool’s massive corpus of data (about 8 million books or 6% of all books ever published) has been used in various scientific studies, concerns about the accuracy of results have simultaneously emerged. This paper reviews the literature and serves as a guideline for improving Google Ngram studies by suggesting five methodological procedures suited to increase the reliability of results. In particular, we recommend the use of (I) different language corpora, (II) cross-checks on different corpora from the same language, (III) word inflections, (IV) synonyms, and (V) a standardization procedure that accounts for both the influx of data and unequal weights of word frequencies. Further, we outline how to combine these procedures and address the risk of potential biases arising from censorship and propaganda. As an example of the proposed procedures, we examine the cross-cultural expression of religion via religious terms for the years 1900 to 2000. Special emphasis is placed on the situation during World War II. In line with the strand of literature that emphasizes the decline of collectivistic values, our results suggest an overall decrease of religion’s importance. However, religion re-gains importance during times of crisis such as World War II. By comparing the results obtained through the different methods, we illustrate that applying and particularly combining our suggested procedures increase the reliability of results and prevents authors from deriving wrong assumptions. Public Library of Science 2019-03-22 /pmc/articles/PMC6430395/ /pubmed/30901329 http://dx.doi.org/10.1371/journal.pone.0213554 Text en © 2019 Younes, Reips http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Younes, Nadja
Reips, Ulf-Dietrich
Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title_full Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title_fullStr Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title_full_unstemmed Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title_short Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms
title_sort guideline for improving the reliability of google ngram studies: evidence from religious terms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6430395/
https://www.ncbi.nlm.nih.gov/pubmed/30901329
http://dx.doi.org/10.1371/journal.pone.0213554
work_keys_str_mv AT younesnadja guidelineforimprovingthereliabilityofgooglengramstudiesevidencefromreligiousterms
AT reipsulfdietrich guidelineforimprovingthereliabilityofgooglengramstudiesevidencefromreligiousterms