Cargando…

Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing

Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mhamed, Mustafa, Sutcliffe, Richard, Sun, Xia, Feng, Jun, Almekhlafi, Eiad, Retta, Ephrem Afele
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449738/ https://www.ncbi.nlm.nih.gov/pubmed/34545281 http://dx.doi.org/10.1155/2021/5538791

_version_	1784569481853403136
author	Mhamed, Mustafa Sutcliffe, Richard Sun, Xia Feng, Jun Almekhlafi, Eiad Retta, Ephrem Afele
author_facet	Mhamed, Mustafa Sutcliffe, Richard Sun, Xia Feng, Jun Almekhlafi, Eiad Retta, Ephrem Afele
author_sort	Mhamed, Mustafa
collection	PubMed
description	Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%.
format	Online Article Text
id	pubmed-8449738
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-84497382021-09-19 Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing Mhamed, Mustafa Sutcliffe, Richard Sun, Xia Feng, Jun Almekhlafi, Eiad Retta, Ephrem Afele Comput Intell Neurosci Research Article Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%. Hindawi 2021-09-06 /pmc/articles/PMC8449738/ /pubmed/34545281 http://dx.doi.org/10.1155/2021/5538791 Text en Copyright © 2021 Mustafa Mhamed et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Mhamed, Mustafa Sutcliffe, Richard Sun, Xia Feng, Jun Almekhlafi, Eiad Retta, Ephrem Afele Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title	Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title_full	Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title_fullStr	Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title_full_unstemmed	Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title_short	Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing
title_sort	improving arabic sentiment analysis using cnn-based architectures and text preprocessing
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449738/ https://www.ncbi.nlm.nih.gov/pubmed/34545281 http://dx.doi.org/10.1155/2021/5538791
work_keys_str_mv	AT mhamedmustafa improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing AT sutclifferichard improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing AT sunxia improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing AT fengjun improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing AT almekhlafieiad improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing AT rettaephremafele improvingarabicsentimentanalysisusingcnnbasedarchitecturesandtextpreprocessing

Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing

Ejemplares similares