Cargando…

IADD: An integrated Arabic dialect identification dataset

Arabic language has different variants that can be roughly categorized into three main categories: Classical Arabic (CA), Modern Standard Arabic (MSA) and Dialectal Arabic (DA). There are subtle differences between MSA and CA in terms of syntax, terminology and pronunciation. However, Dialectal Arab...

Descripción completa

Detalles Bibliográficos
Autor principal:	Zahir, Jihad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2021
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8741435/ https://www.ncbi.nlm.nih.gov/pubmed/35028349 http://dx.doi.org/10.1016/j.dib.2021.107777

Descripción
Sumario:	Arabic language has different variants that can be roughly categorized into three main categories: Classical Arabic (CA), Modern Standard Arabic (MSA) and Dialectal Arabic (DA). There are subtle differences between MSA and CA in terms of syntax, terminology and pronunciation. However, Dialectal Arabic (DA) significantly differs from CA and MSA in that it reflects geographic location of the speaker, or at least the country of origin, if mobility factors are taken into account. This paper presents IADD, an Integrated dataset for Arabic dialect identification, that contains [Formula: see text] texts representing Arabic dialects from 5 regions and 9 countries. IADD dataset is created, from the combination of subsets of five corpora, to support the task of automatic Arabic dialects detection.

IADD: An integrated Arabic dialect identification dataset

Ejemplares similares