Cargando…

Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis

Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Duwairi, Rehab, Abushaqra, Ftoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049132/
https://www.ncbi.nlm.nih.gov/pubmed/33954245
http://dx.doi.org/10.7717/peerj-cs.469
_version_ 1783679369665839104
author Duwairi, Rehab
Abushaqra, Ftoon
author_facet Duwairi, Rehab
Abushaqra, Ftoon
author_sort Duwairi, Rehab
collection PubMed
description Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework.
format Online
Article
Text
id pubmed-8049132
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-80491322021-05-04 Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis Duwairi, Rehab Abushaqra, Ftoon PeerJ Comput Sci Artificial Intelligence Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework. PeerJ Inc. 2021-04-05 /pmc/articles/PMC8049132/ /pubmed/33954245 http://dx.doi.org/10.7717/peerj-cs.469 Text en © 2021 Duwairi and Abushaqra https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Duwairi, Rehab
Abushaqra, Ftoon
Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_full Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_fullStr Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_full_unstemmed Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_short Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_sort syntactic- and morphology-based text augmentation framework for arabic sentiment analysis
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049132/
https://www.ncbi.nlm.nih.gov/pubmed/33954245
http://dx.doi.org/10.7717/peerj-cs.469
work_keys_str_mv AT duwairirehab syntacticandmorphologybasedtextaugmentationframeworkforarabicsentimentanalysis
AT abushaqraftoon syntacticandmorphologybasedtextaugmentationframeworkforarabicsentimentanalysis