Cargando…

Arabic Syntactic Diacritics Restoration Using BERT Models

The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representati...

Descripción completa

Detalles Bibliográficos
Autores principales: Nazih, Waleed, Hifny, Yasser
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9637475/
https://www.ncbi.nlm.nih.gov/pubmed/36348654
http://dx.doi.org/10.1155/2022/3214255
_version_ 1784825196326158336
author Nazih, Waleed
Hifny, Yasser
author_facet Nazih, Waleed
Hifny, Yasser
author_sort Nazih, Waleed
collection PubMed
description The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems.
format Online
Article
Text
id pubmed-9637475
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-96374752022-11-07 Arabic Syntactic Diacritics Restoration Using BERT Models Nazih, Waleed Hifny, Yasser Comput Intell Neurosci Research Article The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems. Hindawi 2022-10-30 /pmc/articles/PMC9637475/ /pubmed/36348654 http://dx.doi.org/10.1155/2022/3214255 Text en Copyright © 2022 Waleed Nazih and Yasser Hifny. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nazih, Waleed
Hifny, Yasser
Arabic Syntactic Diacritics Restoration Using BERT Models
title Arabic Syntactic Diacritics Restoration Using BERT Models
title_full Arabic Syntactic Diacritics Restoration Using BERT Models
title_fullStr Arabic Syntactic Diacritics Restoration Using BERT Models
title_full_unstemmed Arabic Syntactic Diacritics Restoration Using BERT Models
title_short Arabic Syntactic Diacritics Restoration Using BERT Models
title_sort arabic syntactic diacritics restoration using bert models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9637475/
https://www.ncbi.nlm.nih.gov/pubmed/36348654
http://dx.doi.org/10.1155/2022/3214255
work_keys_str_mv AT nazihwaleed arabicsyntacticdiacriticsrestorationusingbertmodels
AT hifnyyasser arabicsyntacticdiacriticsrestorationusingbertmodels