Cargando…

Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology

The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set...

Descripción completa

Detalles Bibliográficos
Autor principal: Dunn, Jonathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861279/
https://www.ncbi.nlm.nih.gov/pubmed/33733104
http://dx.doi.org/10.3389/frai.2019.00015
_version_ 1783647051642306560
author Dunn, Jonathan
author_facet Dunn, Jonathan
author_sort Dunn, Jonathan
collection PubMed
description The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale.
format Online
Article
Text
id pubmed-7861279
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78612792021-03-16 Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology Dunn, Jonathan Front Artif Intell Artificial Intelligence The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale. Frontiers Media S.A. 2019-08-14 /pmc/articles/PMC7861279/ /pubmed/33733104 http://dx.doi.org/10.3389/frai.2019.00015 Text en Copyright © 2019 Dunn. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Dunn, Jonathan
Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title_full Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title_fullStr Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title_full_unstemmed Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title_short Global Syntactic Variation in Seven Languages: Toward a Computational Dialectology
title_sort global syntactic variation in seven languages: toward a computational dialectology
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7861279/
https://www.ncbi.nlm.nih.gov/pubmed/33733104
http://dx.doi.org/10.3389/frai.2019.00015
work_keys_str_mv AT dunnjonathan globalsyntacticvariationinsevenlanguagestowardacomputationaldialectology