Cargando…
Zipf’s law holds for phrases, not words
With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrase...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531284/ https://www.ncbi.nlm.nih.gov/pubmed/26259699 http://dx.doi.org/10.1038/srep12209 |
_version_ | 1782385018583121920 |
---|---|
author | Ryland Williams, Jake Lessard, Paul R. Desu, Suma Clark, Eric M. Bagrow, James P. Danforth, Christopher M. Sheridan Dodds, Peter |
author_facet | Ryland Williams, Jake Lessard, Paul R. Desu, Suma Clark, Eric M. Bagrow, James P. Danforth, Christopher M. Sheridan Dodds, Peter |
author_sort | Ryland Williams, Jake |
collection | PubMed |
description | With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf’s law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases. |
format | Online Article Text |
id | pubmed-4531284 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-45312842015-08-11 Zipf’s law holds for phrases, not words Ryland Williams, Jake Lessard, Paul R. Desu, Suma Clark, Eric M. Bagrow, James P. Danforth, Christopher M. Sheridan Dodds, Peter Sci Rep Article With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf’s law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases. Nature Publishing Group 2015-08-11 /pmc/articles/PMC4531284/ /pubmed/26259699 http://dx.doi.org/10.1038/srep12209 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Ryland Williams, Jake Lessard, Paul R. Desu, Suma Clark, Eric M. Bagrow, James P. Danforth, Christopher M. Sheridan Dodds, Peter Zipf’s law holds for phrases, not words |
title | Zipf’s law holds for phrases, not words |
title_full | Zipf’s law holds for phrases, not words |
title_fullStr | Zipf’s law holds for phrases, not words |
title_full_unstemmed | Zipf’s law holds for phrases, not words |
title_short | Zipf’s law holds for phrases, not words |
title_sort | zipf’s law holds for phrases, not words |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531284/ https://www.ncbi.nlm.nih.gov/pubmed/26259699 http://dx.doi.org/10.1038/srep12209 |
work_keys_str_mv | AT rylandwilliamsjake zipfslawholdsforphrasesnotwords AT lessardpaulr zipfslawholdsforphrasesnotwords AT desusuma zipfslawholdsforphrasesnotwords AT clarkericm zipfslawholdsforphrasesnotwords AT bagrowjamesp zipfslawholdsforphrasesnotwords AT danforthchristopherm zipfslawholdsforphrasesnotwords AT sheridandoddspeter zipfslawholdsforphrasesnotwords |