Cargando…
Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian
Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to t...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501622/ https://www.ncbi.nlm.nih.gov/pubmed/37708112 http://dx.doi.org/10.1371/journal.pone.0291337 |
_version_ | 1785106150917668864 |
---|---|
author | Golbeck, Jennifer |
author_facet | Golbeck, Jennifer |
author_sort | Golbeck, Jennifer |
collection | PubMed |
description | Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to the the frequency rank of words in English, German, French, Spanish, and Italian. We calculated the frequency rank of words in the Google Ngram Viewer corpora. Then, using the first significant digit of the frequency rank, we found the FSD distribution adhered to the expected Benford’s Law distribution. Over a series of additional corpora from sources ranging from news to books to social media and across the languages studied, we consistently found adherence to Benford’s Law. Furthermore, at the user-level on social media, we found Benford’s Law holds for the vast majority of users’ collected posts and significant deviations from Benford’s Law tends to be a mark of spam bots. |
format | Online Article Text |
id | pubmed-10501622 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-105016222023-09-15 Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian Golbeck, Jennifer PLoS One Research Article Benford’s Law states that, in many real-world data sets, the frequency of numbers’ first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford’s Law applies to the the frequency rank of words in English, German, French, Spanish, and Italian. We calculated the frequency rank of words in the Google Ngram Viewer corpora. Then, using the first significant digit of the frequency rank, we found the FSD distribution adhered to the expected Benford’s Law distribution. Over a series of additional corpora from sources ranging from news to books to social media and across the languages studied, we consistently found adherence to Benford’s Law. Furthermore, at the user-level on social media, we found Benford’s Law holds for the vast majority of users’ collected posts and significant deviations from Benford’s Law tends to be a mark of spam bots. Public Library of Science 2023-09-14 /pmc/articles/PMC10501622/ /pubmed/37708112 http://dx.doi.org/10.1371/journal.pone.0291337 Text en © 2023 Jennifer Golbeck https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Golbeck, Jennifer Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title | Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title_full | Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title_fullStr | Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title_full_unstemmed | Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title_short | Benford’s Law applies to word frequency rank in English, German, French, Spanish, and Italian |
title_sort | benford’s law applies to word frequency rank in english, german, french, spanish, and italian |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501622/ https://www.ncbi.nlm.nih.gov/pubmed/37708112 http://dx.doi.org/10.1371/journal.pone.0291337 |
work_keys_str_mv | AT golbeckjennifer benfordslawappliestowordfrequencyrankinenglishgermanfrenchspanishanditalian |