Cargando…

Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian

Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to t...

Descripción completa

Detalles Bibliográficos
Autor principal: Golbeck, Jennifer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501622/
https://www.ncbi.nlm.nih.gov/pubmed/37708112
http://dx.doi.org/10.1371/journal.pone.0291337
_version_ 1785106150917668864
author Golbeck, Jennifer
author_facet Golbeck, Jennifer
author_sort Golbeck, Jennifer
collection PubMed
description Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to the the frequency rank of words in English, German, French, Spanish, and Italian. We calculated the frequency rank of words in the Google Ngram Viewer corpora. Then, using the first significant digit of the frequency rank, we found the FSD distribution adhered to the expected Benford’s Law distribution. Over a series of additional corpora from sources ranging from news to books to social media and across the languages studied, we consistently found adherence to Benford’s Law. Furthermore, at the user-level on social media, we found Benford’s Law holds for the vast majority of users’ collected posts and significant deviations from Benford’s Law tends to be a mark of spam bots.
format Online
Article
Text
id pubmed-10501622
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105016222023-09-15 Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian Golbeck, Jennifer PLoS One Research Article Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to the the frequency rank of words in English, German, French, Spanish, and Italian. We calculated the frequency rank of words in the Google Ngram Viewer corpora. Then, using the first significant digit of the frequency rank, we found the FSD distribution adhered to the expected Benford’s Law distribution. Over a series of additional corpora from sources ranging from news to books to social media and across the languages studied, we consistently found adherence to Benford’s Law. Furthermore, at the user-level on social media, we found Benford’s Law holds for the vast majority of users’ collected posts and significant deviations from Benford’s Law tends to be a mark of spam bots. Public Library of Science 2023-09-14 /pmc/articles/PMC10501622/ /pubmed/37708112 http://dx.doi.org/10.1371/journal.pone.0291337 Text en © 2023 Jennifer Golbeck https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Golbeck, Jennifer
Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title_full Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title_fullStr Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title_full_unstemmed Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title_short Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
title_sort benford’s law applies to word frequency rank in english, german, french, spanish, and italian
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501622/
https://www.ncbi.nlm.nih.gov/pubmed/37708112
http://dx.doi.org/10.1371/journal.pone.0291337
work_keys_str_mv AT golbeckjennifer benfordslawappliestowordfrequencyrankinenglishgermanfrenchspanishanditalian