Cargando…

General Northern English. Exploring Regional Variation in the North of England With Machine Learning

In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent assoc...

Descripción completa

Detalles Bibliográficos
Autores principales: Strycharczuk, Patrycja, López-Ibáñez, Manuel, Brown, Georgina, Leemann, Adrian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861339/
https://www.ncbi.nlm.nih.gov/pubmed/33733165
http://dx.doi.org/10.3389/frai.2020.00048
_version_ 1783647065584173056
author Strycharczuk, Patrycja
López-Ibáñez, Manuel
Brown, Georgina
Leemann, Adrian
author_facet Strycharczuk, Patrycja
López-Ibáñez, Manuel
Brown, Georgina
Leemann, Adrian
author_sort Strycharczuk, Patrycja
collection PubMed
description In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.
format Online
Article
Text
id pubmed-7861339
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78613392021-03-16 General Northern English. Exploring Regional Variation in the North of England With Machine Learning Strycharczuk, Patrycja López-Ibáñez, Manuel Brown, Georgina Leemann, Adrian Front Artif Intell Artificial Intelligence In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English. Frontiers Media S.A. 2020-07-15 /pmc/articles/PMC7861339/ /pubmed/33733165 http://dx.doi.org/10.3389/frai.2020.00048 Text en Copyright © 2020 Strycharczuk, López-Ibáñez, Brown and Leemann. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Strycharczuk, Patrycja
López-Ibáñez, Manuel
Brown, Georgina
Leemann, Adrian
General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title_full General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title_fullStr General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title_full_unstemmed General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title_short General Northern English. Exploring Regional Variation in the North of England With Machine Learning
title_sort general northern english. exploring regional variation in the north of england with machine learning
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861339/
https://www.ncbi.nlm.nih.gov/pubmed/33733165
http://dx.doi.org/10.3389/frai.2020.00048
work_keys_str_mv AT strycharczukpatrycja generalnorthernenglishexploringregionalvariationinthenorthofenglandwithmachinelearning
AT lopezibanezmanuel generalnorthernenglishexploringregionalvariationinthenorthofenglandwithmachinelearning
AT browngeorgina generalnorthernenglishexploringregionalvariationinthenorthofenglandwithmachinelearning
AT leemannadrian generalnorthernenglishexploringregionalvariationinthenorthofenglandwithmachinelearning