Cargando…

Using Twitter to collect a multi-dialectal corpus of Albanian using advanced geotagging and dialect modeling

In this study, we present the acquisition and categorization of a geographically-informed, multi-dialectal Albanian National Corpus, derived from Twitter data. The primary dialects from three distinct regions—Albania, Kosovo, and North Macedonia—are considered. The assembled publicly available datas...

Descripción completa

Detalles Bibliográficos
Autores principales: Canhasi, Ercan, Shijaku, Rexhep
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681245/
https://www.ncbi.nlm.nih.gov/pubmed/38011168
http://dx.doi.org/10.1371/journal.pone.0294284