Cargando…

Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis

To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehe...

Descripción completa

Detalles Bibliográficos
Autores principales: Stärk, Katja, Kidd, Evan, Frost, Rebecca L. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8886305/
https://www.ncbi.nlm.nih.gov/pubmed/33517856
http://dx.doi.org/10.1177/0023830920979016
_version_ 1784660639900237824
author Stärk, Katja
Kidd, Evan
Frost, Rebecca L. A.
author_facet Stärk, Katja
Kidd, Evan
Frost, Rebecca L. A.
author_sort Stärk, Katja
collection PubMed
description To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.
format Online
Article
Text
id pubmed-8886305
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-88863052022-03-02 Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis Stärk, Katja Kidd, Evan Frost, Rebecca L. A. Lang Speech Articles To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic. SAGE Publications 2021-01-30 2022-03 /pmc/articles/PMC8886305/ /pubmed/33517856 http://dx.doi.org/10.1177/0023830920979016 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Articles
Stärk, Katja
Kidd, Evan
Frost, Rebecca L. A.
Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title_full Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title_fullStr Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title_full_unstemmed Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title_short Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis
title_sort word segmentation cues in german child-directed speech: a corpus analysis
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8886305/
https://www.ncbi.nlm.nih.gov/pubmed/33517856
http://dx.doi.org/10.1177/0023830920979016
work_keys_str_mv AT starkkatja wordsegmentationcuesingermanchilddirectedspeechacorpusanalysis
AT kiddevan wordsegmentationcuesingermanchilddirectedspeechacorpusanalysis
AT frostrebeccala wordsegmentationcuesingermanchilddirectedspeechacorpusanalysis