Cargando…
Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558701/ https://www.ncbi.nlm.nih.gov/pubmed/23378896 http://dx.doi.org/10.1038/srep01082 |
_version_ | 1782257468051554304 |
---|---|
author | Lü, Linyuan Zhang, Zi-Ke Zhou, Tao |
author_facet | Lü, Linyuan Zhang, Zi-Ke Zhou, Tao |
author_sort | Lü, Linyuan |
collection | PubMed |
description | Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems. |
format | Online Article Text |
id | pubmed-3558701 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-35587012013-02-01 Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes Lü, Linyuan Zhang, Zi-Ke Zhou, Tao Sci Rep Article Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems. Nature Publishing Group 2013-01-30 /pmc/articles/PMC3558701/ /pubmed/23378896 http://dx.doi.org/10.1038/srep01082 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/3.0/ This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ |
spellingShingle | Article Lü, Linyuan Zhang, Zi-Ke Zhou, Tao Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title | Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title_full | Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title_fullStr | Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title_full_unstemmed | Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title_short | Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes |
title_sort | deviation of zipf's and heaps' laws in human languages with limited dictionary sizes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558701/ https://www.ncbi.nlm.nih.gov/pubmed/23378896 http://dx.doi.org/10.1038/srep01082 |
work_keys_str_mv | AT lulinyuan deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes AT zhangzike deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes AT zhoutao deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes |