Cargando…

ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings

Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measu...

Descripción completa

Detalles Bibliográficos
Autores principales: Räsänen, Okko, Seshadri, Shreyas, Lavechin, Marvin, Cristia, Alejandrina, Casillas, Marisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8062390/
https://www.ncbi.nlm.nih.gov/pubmed/32875399
http://dx.doi.org/10.3758/s13428-020-01460-x
_version_ 1783681752775000064
author Räsänen, Okko
Seshadri, Shreyas
Lavechin, Marvin
Cristia, Alejandrina
Casillas, Marisa
author_facet Räsänen, Okko
Seshadri, Shreyas
Lavechin, Marvin
Cristia, Alejandrina
Casillas, Marisa
author_sort Räsänen, Okko
collection PubMed
description Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings.
format Online
Article
Text
id pubmed-8062390
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-80623902021-05-05 ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings Räsänen, Okko Seshadri, Shreyas Lavechin, Marvin Cristia, Alejandrina Casillas, Marisa Behav Res Methods Article Recordings captured by wearable microphones are a standard method for investigating young children’s language environments. A key measure to quantify from such data is the amount of speech present in children’s home environments. To this end, the LENA recorder and software—a popular system for measuring linguistic input—estimates the number of adult words that children may hear over the course of a recording. However, word count estimation is challenging to do in a language- independent manner; the relationship between observable acoustic patterns and language-specific lexical entities is far from uniform across human languages. In this paper, we ask whether some alternative linguistic units, namely phone(me)s or syllables, could be measured instead of, or in parallel with, words in order to achieve improved cross-linguistic applicability and comparability of an automated system for measuring child language input. We discuss the advantages and disadvantages of measuring different units from theoretical and technical points of view. We also investigate the practical applicability of measuring such units using a novel system called Automatic LInguistic unit Count Estimator (ALICE) together with audio from seven child-centered daylong audio corpora from diverse cultural and linguistic environments. We show that language-independent measurement of phoneme counts is somewhat more accurate than syllables or words, but all three are highly correlated with human annotations on the same data. We share an open-source implementation of ALICE for use by the language research community, enabling automatic phoneme, syllable, and word count estimation from child-centered audio recordings. Springer US 2020-09-01 2021 /pmc/articles/PMC8062390/ /pubmed/32875399 http://dx.doi.org/10.3758/s13428-020-01460-x Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Räsänen, Okko
Seshadri, Shreyas
Lavechin, Marvin
Cristia, Alejandrina
Casillas, Marisa
ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title_full ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title_fullStr ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title_full_unstemmed ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title_short ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
title_sort alice: an open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8062390/
https://www.ncbi.nlm.nih.gov/pubmed/32875399
http://dx.doi.org/10.3758/s13428-020-01460-x
work_keys_str_mv AT rasanenokko aliceanopensourcetoolforautomaticmeasurementofphonemesyllableandwordcountsfromchildcentereddaylongrecordings
AT seshadrishreyas aliceanopensourcetoolforautomaticmeasurementofphonemesyllableandwordcountsfromchildcentereddaylongrecordings
AT lavechinmarvin aliceanopensourcetoolforautomaticmeasurementofphonemesyllableandwordcountsfromchildcentereddaylongrecordings
AT cristiaalejandrina aliceanopensourcetoolforautomaticmeasurementofphonemesyllableandwordcountsfromchildcentereddaylongrecordings
AT casillasmarisa aliceanopensourcetoolforautomaticmeasurementofphonemesyllableandwordcountsfromchildcentereddaylongrecordings