Cargando…
Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING)
Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measuremen...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/ https://www.ncbi.nlm.nih.gov/pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543 |
_version_ | 1783691666635358208 |
---|---|
author | Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby |
author_facet | Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby |
author_sort | Kendall, Tyler |
collection | PubMed |
description | Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper. |
format | Online Article Text |
id | pubmed-8117961 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81179612021-05-14 Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby Front Artif Intell Artificial Intelligence Impressionistic coding of sociolinguistic variables like English (ING), the alternation between pronunciations like talkin' and talking, has been a central part of the analytic workflow in studies of language variation and change for over a half-century. Techniques for automating the measurement and coding for a wide range of sociolinguistic data have been on the rise over recent decades but procedures for coding some features, especially those without clearly defined acoustic correlates like (ING), have lagged behind others, such as vowels and sibilants. This paper explores computational methods for automatically coding variable (ING) in speech recordings, examining the use of automatic speech recognition procedures related to forced alignment (using the Montreal Forced Aligner) as well as supervised machine learning algorithms (linear and radial support vector machines, and random forests). Considering the automated coding of pronunciation variables like (ING) raises broader questions for sociolinguistic methods, such as how much different human analysts agree in their impressionistic codes for such variables and what data might act as the “gold standard” for training and testing of automated procedures. This paper explores several of these considerations in automated, and manual, coding of sociolinguistic variables and provides baseline performance data for automated and manual coding methods. We consider multiple ways of assessing algorithms' performance, including agreement with human coders, as well as the impact on the outcome of an analysis of (ING) that includes linguistic and social factors. Our results show promise for automated coding methods but also highlight that variability in results should be expected even with careful human coded data. All data for our study come from the public Corpus of Regional African American Language and code and derivative datasets (including our hand-coded data) are available with the paper. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8117961/ /pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543 Text en Copyright © 2021 Kendall, Vaughn, Farrington, Gunter, McLean, Tacata and Arnson. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Artificial Intelligence Kendall, Tyler Vaughn, Charlotte Farrington, Charlie Gunter, Kaylynn McLean, Jaidan Tacata, Chloe Arnson, Shelby Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title | Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title_full | Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title_fullStr | Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title_full_unstemmed | Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title_short | Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons From Variable (ING) |
title_sort | considering performance in the automated and manual coding of sociolinguistic variables: lessons from variable (ing) |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8117961/ https://www.ncbi.nlm.nih.gov/pubmed/33997775 http://dx.doi.org/10.3389/frai.2021.648543 |
work_keys_str_mv | AT kendalltyler consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT vaughncharlotte consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT farringtoncharlie consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT gunterkaylynn consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT mcleanjaidan consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT tacatachloe consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing AT arnsonshelby consideringperformanceintheautomatedandmanualcodingofsociolinguisticvariableslessonsfromvariableing |