Cargando…

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measuremen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kendall, Tyler, Vaughn, Charlotte, Farrington, Charlie, Gunter, Kaylynn, McLean, Jaidan, Tacata, Chloe, Arnson, Shelby
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/ https://www.ncbi.nlm.nih.gov/pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543

_version_	1783691666635358208
author	Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby
author_facet	Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby
author_sort	Kendall, Tyler
collection	PubMed
description	Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper.
format	Online Article Text
id	pubmed-8117961
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-81179612021-05-14 Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby Front Artif Intell Artificial Intelligence Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8117961/ /pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543 Text en Copyright © 2021 Kendall, Vaughn, Farrington, Gunter, McLean, Tacata and Arnson. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title	Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_full	Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_fullStr	Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_full_unstemmed	Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_short	Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_sort	considering performance in the automated and manual coding of sociolinguistic variables: lessons from variable (ing)
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/ https://www.ncbi.nlm.nih.gov/pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543
work_keys_str_mv	AT kendalltyler consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT vaughncharlotte consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT farringtoncharlie consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT gunterkaylynn consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT mcleanjaidan consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT tacatachloe consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT arnsonshelby consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Ejemplares similares