Cargando…

Long-Range Correlation Underlying Childhood Language and Generative Models

Long-range correlation, a property of time series exhibiting relevant statistical dependence between two distant subsequences, is mainly studied in the statistical physics domain and has been reported to exist in natural language. By using a state-of-the-art method for such analysis, long-range corr...

Descripción completa

Detalles Bibliográficos
Autor principal: Tanaka-Ishii, Kumiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157415/
https://www.ncbi.nlm.nih.gov/pubmed/30283378
http://dx.doi.org/10.3389/fpsyg.2018.01725
_version_ 1783358267491090432
author Tanaka-Ishii, Kumiko
author_facet Tanaka-Ishii, Kumiko
author_sort Tanaka-Ishii, Kumiko
collection PubMed
description Long-range correlation, a property of time series exhibiting relevant statistical dependence between two distant subsequences, is mainly studied in the statistical physics domain and has been reported to exist in natural language. By using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, generative stochastic models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model is found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Because the Simon model is known not to correctly reflect the vocabulary growth of natural languages, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have some relation with actual linguistic processes.
format Online
Article
Text
id pubmed-6157415
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-61574152018-10-03 Long-Range Correlation Underlying Childhood Language and Generative Models Tanaka-Ishii, Kumiko Front Psychol Psychology Long-range correlation, a property of time series exhibiting relevant statistical dependence between two distant subsequences, is mainly studied in the statistical physics domain and has been reported to exist in natural language. By using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, generative stochastic models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model is found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Because the Simon model is known not to correctly reflect the vocabulary growth of natural languages, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have some relation with actual linguistic processes. Frontiers Media S.A. 2018-09-19 /pmc/articles/PMC6157415/ /pubmed/30283378 http://dx.doi.org/10.3389/fpsyg.2018.01725 Text en Copyright © 2018 Tanaka-Ishii. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Tanaka-Ishii, Kumiko
Long-Range Correlation Underlying Childhood Language and Generative Models
title Long-Range Correlation Underlying Childhood Language and Generative Models
title_full Long-Range Correlation Underlying Childhood Language and Generative Models
title_fullStr Long-Range Correlation Underlying Childhood Language and Generative Models
title_full_unstemmed Long-Range Correlation Underlying Childhood Language and Generative Models
title_short Long-Range Correlation Underlying Childhood Language and Generative Models
title_sort long-range correlation underlying childhood language and generative models
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157415/
https://www.ncbi.nlm.nih.gov/pubmed/30283378
http://dx.doi.org/10.3389/fpsyg.2018.01725
work_keys_str_mv AT tanakaishiikumiko longrangecorrelationunderlyingchildhoodlanguageandgenerativemodels