Cargando…

Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis

Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate pre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Duwairi, Rehab, Abushaqra, Ftoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049132/ https://www.ncbi.nlm.nih.gov/pubmed/33954245 http://dx.doi.org/10.7717/peerj-cs.469

_version_	1783679369665839104
author	Duwairi, Rehab Abushaqra, Ftoon
author_facet	Duwairi, Rehab Abushaqra, Ftoon
author_sort	Duwairi, Rehab
collection	PubMed
description	Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework.
format	Online Article Text
id	pubmed-8049132
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-80491322021-05-04 Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis Duwairi, Rehab Abushaqra, Ftoon PeerJ Comput Sci Artificial Intelligence Arabic language is a challenging language for automatic processing. This is due to several intrinsic reasons such as Arabic multi-dialects, ambiguous syntax, syntactical flexibility and diacritics. Machine learning and deep learning frameworks require big datasets for training to ensure accurate predictions. This leads to another challenge faced by researches using Arabic text; as Arabic textual datasets of high quality are still scarce. In this paper, an intelligent framework for expanding or augmenting Arabic sentences is presented. The sentences were initially labelled by human annotators for sentiment analysis. The novel approach presented in this work relies on the rich morphology of Arabic, synonymy lists, syntactical or grammatical rules, and negation rules to generate new sentences from the seed sentences with their proper labels. Most augmentation techniques target image or video data. This study is the first work to target text augmentation for Arabic language. Using this framework, we were able to increase the size of the initial seed datasets by 10 folds. Experiments that assess the impact of this augmentation on sentiment analysis showed a 42% average increase in accuracy, due to the reliability and the high quality of the rules used to build this framework. PeerJ Inc. 2021-04-05 /pmc/articles/PMC8049132/ /pubmed/33954245 http://dx.doi.org/10.7717/peerj-cs.469 Text en © 2021 Duwairi and Abushaqra https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Duwairi, Rehab Abushaqra, Ftoon Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title	Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_full	Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_fullStr	Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_full_unstemmed	Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_short	Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis
title_sort	syntactic- and morphology-based text augmentation framework for arabic sentiment analysis
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049132/ https://www.ncbi.nlm.nih.gov/pubmed/33954245 http://dx.doi.org/10.7717/peerj-cs.469
work_keys_str_mv	AT duwairirehab syntacticandmorphologybasedtextaugmentationframeworkforarabicsentimentanalysis AT abushaqraftoon syntacticandmorphologybasedtextaugmentationframeworkforarabicsentimentanalysis

Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis

Ejemplares similares