Cargando…

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lü, Linyuan, Zhang, Zi-Ke, Zhou, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558701/
https://www.ncbi.nlm.nih.gov/pubmed/23378896
http://dx.doi.org/10.1038/srep01082
_version_ 1782257468051554304
author Lü, Linyuan
Zhang, Zi-Ke
Zhou, Tao
author_facet Lü, Linyuan
Zhang, Zi-Ke
Zhou, Tao
author_sort Lü, Linyuan
collection PubMed
description Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems.
format Online
Article
Text
id pubmed-3558701
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-35587012013-02-01 Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes Lü, Linyuan Zhang, Zi-Ke Zhou, Tao Sci Rep Article Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems. Nature Publishing Group 2013-01-30 /pmc/articles/PMC3558701/ /pubmed/23378896 http://dx.doi.org/10.1038/srep01082 Text en Copyright © 2013, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/3.0/ This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
spellingShingle Article
Lü, Linyuan
Zhang, Zi-Ke
Zhou, Tao
Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title_full Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title_fullStr Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title_full_unstemmed Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title_short Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes
title_sort deviation of zipf's and heaps' laws in human languages with limited dictionary sizes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558701/
https://www.ncbi.nlm.nih.gov/pubmed/23378896
http://dx.doi.org/10.1038/srep01082
work_keys_str_mv AT lulinyuan deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes
AT zhangzike deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes
AT zhoutao deviationofzipfsandheapslawsinhumanlanguageswithlimiteddictionarysizes