Cargando…

Crowdsourcing Dialect Characterization through Twitter

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gonçalves, Bruno, Sánchez, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4237322/ https://www.ncbi.nlm.nih.gov/pubmed/25409174 http://dx.doi.org/10.1371/journal.pone.0112074

_version_	1782345322681335808
author	Gonçalves, Bruno Sánchez, David
author_facet	Gonçalves, Bruno Sánchez, David
author_sort	Gonçalves, Bruno
collection	PubMed
description	We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.
format	Online Article Text
id	pubmed-4237322
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-42373222014-11-21 Crowdsourcing Dialect Characterization through Twitter Gonçalves, Bruno Sánchez, David PLoS One Research Article We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character. Public Library of Science 2014-11-19 /pmc/articles/PMC4237322/ /pubmed/25409174 http://dx.doi.org/10.1371/journal.pone.0112074 Text en © 2014 Gonçalves, Sánchez http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Gonçalves, Bruno Sánchez, David Crowdsourcing Dialect Characterization through Twitter
title	Crowdsourcing Dialect Characterization through Twitter
title_full	Crowdsourcing Dialect Characterization through Twitter
title_fullStr	Crowdsourcing Dialect Characterization through Twitter
title_full_unstemmed	Crowdsourcing Dialect Characterization through Twitter
title_short	Crowdsourcing Dialect Characterization through Twitter
title_sort	crowdsourcing dialect characterization through twitter
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4237322/ https://www.ncbi.nlm.nih.gov/pubmed/25409174 http://dx.doi.org/10.1371/journal.pone.0112074
work_keys_str_mv	AT goncalvesbruno crowdsourcingdialectcharacterizationthroughtwitter AT sanchezdavid crowdsourcingdialectcharacterizationthroughtwitter

Crowdsourcing Dialect Characterization through Twitter

Ejemplares similares