Cargando…

Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)

Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measuremen...

Descripción completa

Detalles Bibliográficos
Autores principales: Kendall, Tyler, Vaughn, Charlotte, Farrington, Charlie, Gunter, Kaylynn, McLean, Jaidan, Tacata, Chloe, Arnson, Shelby
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/
https://www.ncbi.nlm.nih.gov/pubmed/33997775
http://dx.doi.org/10.3389/frai.2021.648543
_version_ 1783691666635358208
author Kendall, Tyler
Vaughn, Charlotte
Farrington, Charlie
Gunter, Kaylynn
McLean, Jaidan
Tacata, Chloe
Arnson, Shelby
author_facet Kendall, Tyler
Vaughn, Charlotte
Farrington, Charlie
Gunter, Kaylynn
McLean, Jaidan
Tacata, Chloe
Arnson, Shelby
author_sort Kendall, Tyler
collection PubMed
description Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper.
format Online
Article
Text
id pubmed-8117961
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81179612021-05-14 Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby Front Artif Intell Artificial Intelligence Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8117961/ /pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543 Text en Copyright © 2021 Kendall, Vaughn, Farrington, Gunter, McLean, Tacata and Arnson. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Kendall, Tyler
Vaughn, Charlotte
Farrington, Charlie
Gunter, Kaylynn
McLean, Jaidan
Tacata, Chloe
Arnson, Shelby
Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_full Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_fullStr Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_full_unstemmed Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_short Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
title_sort considering performance in the automated and manual coding of sociolinguistic variables: lessons from variable (ing)
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/
https://www.ncbi.nlm.nih.gov/pubmed/33997775
http://dx.doi.org/10.3389/frai.2021.648543
work_keys_str_mv AT kendalltyler consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT vaughncharlotte consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT farringtoncharlie consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT gunterkaylynn consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT mcleanjaidan consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT tacatachloe consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing
AT arnsonshelby consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing