Cargando…
Arabic Syntactic Diacritics Restoration Using BERT Models
The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representati...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9637475/ https://www.ncbi.nlm.nih.gov/pubmed/36348654 http://dx.doi.org/10.1155/2022/3214255 |
_version_ | 1784825196326158336 |
---|---|
author | Nazih, Waleed Hifny, Yasser |
author_facet | Nazih, Waleed Hifny, Yasser |
author_sort | Nazih, Waleed |
collection | PubMed |
description | The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems. |
format | Online Article Text |
id | pubmed-9637475 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-96374752022-11-07 Arabic Syntactic Diacritics Restoration Using BERT Models Nazih, Waleed Hifny, Yasser Comput Intell Neurosci Research Article The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems. Hindawi 2022-10-30 /pmc/articles/PMC9637475/ /pubmed/36348654 http://dx.doi.org/10.1155/2022/3214255 Text en Copyright © 2022 Waleed Nazih and Yasser Hifny. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Nazih, Waleed Hifny, Yasser Arabic Syntactic Diacritics Restoration Using BERT Models |
title | Arabic Syntactic Diacritics Restoration Using BERT Models |
title_full | Arabic Syntactic Diacritics Restoration Using BERT Models |
title_fullStr | Arabic Syntactic Diacritics Restoration Using BERT Models |
title_full_unstemmed | Arabic Syntactic Diacritics Restoration Using BERT Models |
title_short | Arabic Syntactic Diacritics Restoration Using BERT Models |
title_sort | arabic syntactic diacritics restoration using bert models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9637475/ https://www.ncbi.nlm.nih.gov/pubmed/36348654 http://dx.doi.org/10.1155/2022/3214255 |
work_keys_str_mv | AT nazihwaleed arabicsyntacticdiacriticsrestorationusingbertmodels AT hifnyyasser arabicsyntacticdiacriticsrestorationusingbertmodels |