Cargando…
How Many Is Enough?—Statistical Principles for Lexicostatistics
Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to co...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5149542/ https://www.ncbi.nlm.nih.gov/pubmed/28018261 http://dx.doi.org/10.3389/fpsyg.2016.01916 |
_version_ | 1782474025495166976 |
---|---|
author | Zhang, Menghan Gong, Tao |
author_facet | Zhang, Menghan Gong, Tao |
author_sort | Zhang, Menghan |
collection | PubMed |
description | Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics. |
format | Online Article Text |
id | pubmed-5149542 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-51495422016-12-23 How Many Is Enough?—Statistical Principles for Lexicostatistics Zhang, Menghan Gong, Tao Front Psychol Psychology Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics. Frontiers Media S.A. 2016-12-12 /pmc/articles/PMC5149542/ /pubmed/28018261 http://dx.doi.org/10.3389/fpsyg.2016.01916 Text en Copyright © 2016 Zhang and Gong. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Psychology Zhang, Menghan Gong, Tao How Many Is Enough?—Statistical Principles for Lexicostatistics |
title | How Many Is Enough?—Statistical Principles for Lexicostatistics |
title_full | How Many Is Enough?—Statistical Principles for Lexicostatistics |
title_fullStr | How Many Is Enough?—Statistical Principles for Lexicostatistics |
title_full_unstemmed | How Many Is Enough?—Statistical Principles for Lexicostatistics |
title_short | How Many Is Enough?—Statistical Principles for Lexicostatistics |
title_sort | how many is enough?—statistical principles for lexicostatistics |
topic | Psychology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5149542/ https://www.ncbi.nlm.nih.gov/pubmed/28018261 http://dx.doi.org/10.3389/fpsyg.2016.01916 |
work_keys_str_mv | AT zhangmenghan howmanyisenoughstatisticalprinciplesforlexicostatistics AT gongtao howmanyisenoughstatisticalprinciplesforlexicostatistics |