Cargando…

Zipf’s law holds for phrases, not words

With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrase...

Descripción completa

Detalles Bibliográficos
Autores principales: Ryland Williams, Jake, Lessard, Paul R., Desu, Suma, Clark, Eric M., Bagrow, James P., Danforth, Christopher M., Sheridan Dodds, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531284/
https://www.ncbi.nlm.nih.gov/pubmed/26259699
http://dx.doi.org/10.1038/srep12209
_version_ 1782385018583121920
author Ryland Williams, Jake
Lessard, Paul R.
Desu, Suma
Clark, Eric M.
Bagrow, James P.
Danforth, Christopher M.
Sheridan Dodds, Peter
author_facet Ryland Williams, Jake
Lessard, Paul R.
Desu, Suma
Clark, Eric M.
Bagrow, James P.
Danforth, Christopher M.
Sheridan Dodds, Peter
author_sort Ryland Williams, Jake
collection PubMed
description With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf’s law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.
format Online
Article
Text
id pubmed-4531284
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-45312842015-08-11 Zipf’s law holds for phrases, not words Ryland Williams, Jake Lessard, Paul R. Desu, Suma Clark, Eric M. Bagrow, James P. Danforth, Christopher M. Sheridan Dodds, Peter Sci Rep Article With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf’s law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases. Nature Publishing Group 2015-08-11 /pmc/articles/PMC4531284/ /pubmed/26259699 http://dx.doi.org/10.1038/srep12209 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Ryland Williams, Jake
Lessard, Paul R.
Desu, Suma
Clark, Eric M.
Bagrow, James P.
Danforth, Christopher M.
Sheridan Dodds, Peter
Zipf’s law holds for phrases, not words
title Zipf’s law holds for phrases, not words
title_full Zipf’s law holds for phrases, not words
title_fullStr Zipf’s law holds for phrases, not words
title_full_unstemmed Zipf’s law holds for phrases, not words
title_short Zipf’s law holds for phrases, not words
title_sort zipf’s law holds for phrases, not words
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531284/
https://www.ncbi.nlm.nih.gov/pubmed/26259699
http://dx.doi.org/10.1038/srep12209
work_keys_str_mv AT rylandwilliamsjake zipfslawholdsforphrasesnotwords
AT lessardpaulr zipfslawholdsforphrasesnotwords
AT desusuma zipfslawholdsforphrasesnotwords
AT clarkericm zipfslawholdsforphrasesnotwords
AT bagrowjamesp zipfslawholdsforphrasesnotwords
AT danforthchristopherm zipfslawholdsforphrasesnotwords
AT sheridandoddspeter zipfslawholdsforphrasesnotwords