Cargando…

Arabic Syntactic Diacritics Restoration Using BERT Models

The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nazih, Waleed, Hifny, Yasser
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9637475/ https://www.ncbi.nlm.nih.gov/pubmed/36348654 http://dx.doi.org/10.1155/2022/3214255

Descripción
Sumario:	The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems.

Arabic Syntactic Diacritics Restoration Using BERT Models

Ejemplares similares