Cargando…

How Many Is Enough?—Statistical Principles for Lexicostatistics

Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to co...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Menghan, Gong, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5149542/
https://www.ncbi.nlm.nih.gov/pubmed/28018261
http://dx.doi.org/10.3389/fpsyg.2016.01916
_version_ 1782474025495166976
author Zhang, Menghan
Gong, Tao
author_facet Zhang, Menghan
Gong, Tao
author_sort Zhang, Menghan
collection PubMed
description Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.
format Online
Article
Text
id pubmed-5149542
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-51495422016-12-23 How Many Is Enough?—Statistical Principles for Lexicostatistics Zhang, Menghan Gong, Tao Front Psychol Psychology Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics. Frontiers Media S.A. 2016-12-12 /pmc/articles/PMC5149542/ /pubmed/28018261 http://dx.doi.org/10.3389/fpsyg.2016.01916 Text en Copyright © 2016 Zhang and Gong. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Zhang, Menghan
Gong, Tao
How Many Is Enough?—Statistical Principles for Lexicostatistics
title How Many Is Enough?—Statistical Principles for Lexicostatistics
title_full How Many Is Enough?—Statistical Principles for Lexicostatistics
title_fullStr How Many Is Enough?—Statistical Principles for Lexicostatistics
title_full_unstemmed How Many Is Enough?—Statistical Principles for Lexicostatistics
title_short How Many Is Enough?—Statistical Principles for Lexicostatistics
title_sort how many is enough?—statistical principles for lexicostatistics
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5149542/
https://www.ncbi.nlm.nih.gov/pubmed/28018261
http://dx.doi.org/10.3389/fpsyg.2016.01916
work_keys_str_mv AT zhangmenghan howmanyisenoughstatisticalprinciplesforlexicostatistics
AT gongtao howmanyisenoughstatisticalprinciplesforlexicostatistics