Cargando…

Preprocessing Arabic text on social media

Currently, social media plays an important role in daily life and routine. Millions of people use social media for different purposes. Large amounts of data flow through online networks every second, and these data contain valuable information that can be extracted if the data are properly processed...

Descripción completa

Detalles Bibliográficos
Autores principales: Hegazi, Mohamed Osman, Al-Dossari, Yasser, Al-Yahy, Abdullah, Al-Sumari, Abdulaziz, Hilal, Anwer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7895730/
https://www.ncbi.nlm.nih.gov/pubmed/33644469
http://dx.doi.org/10.1016/j.heliyon.2021.e06191
_version_ 1783653419891818496
author Hegazi, Mohamed Osman
Al-Dossari, Yasser
Al-Yahy, Abdullah
Al-Sumari, Abdulaziz
Hilal, Anwer
author_facet Hegazi, Mohamed Osman
Al-Dossari, Yasser
Al-Yahy, Abdullah
Al-Sumari, Abdulaziz
Hilal, Anwer
author_sort Hegazi, Mohamed Osman
collection PubMed
description Currently, social media plays an important role in daily life and routine. Millions of people use social media for different purposes. Large amounts of data flow through online networks every second, and these data contain valuable information that can be extracted if the data are properly processed and analyzed. However, most of the processing results are affected by preprocessing difficulties. This paper presents an approach to extract information from social media Arabic text. It provides an integrated solution for the challenges in preprocessing Arabic text on social media in four stages: data collection, cleaning, enrichment, and availability. The preprocessed Arabic text is stored in structured database tables to provide a useful corpus to which, information extraction and data analysis algorithms can be applied. The experiment in this study reveals that the implementation of the proposed approach yields a useful and full-featured dataset and valuable information. The resultant dataset presented the Arabic text in three structured levels with more than 20 features. Additionally, the experiment provides valuable information and processed results such as topic classification and sentiment analysis.
format Online
Article
Text
id pubmed-7895730
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-78957302021-02-25 Preprocessing Arabic text on social media Hegazi, Mohamed Osman Al-Dossari, Yasser Al-Yahy, Abdullah Al-Sumari, Abdulaziz Hilal, Anwer Heliyon Research Article Currently, social media plays an important role in daily life and routine. Millions of people use social media for different purposes. Large amounts of data flow through online networks every second, and these data contain valuable information that can be extracted if the data are properly processed and analyzed. However, most of the processing results are affected by preprocessing difficulties. This paper presents an approach to extract information from social media Arabic text. It provides an integrated solution for the challenges in preprocessing Arabic text on social media in four stages: data collection, cleaning, enrichment, and availability. The preprocessed Arabic text is stored in structured database tables to provide a useful corpus to which, information extraction and data analysis algorithms can be applied. The experiment in this study reveals that the implementation of the proposed approach yields a useful and full-featured dataset and valuable information. The resultant dataset presented the Arabic text in three structured levels with more than 20 features. Additionally, the experiment provides valuable information and processed results such as topic classification and sentiment analysis. Elsevier 2021-02-13 /pmc/articles/PMC7895730/ /pubmed/33644469 http://dx.doi.org/10.1016/j.heliyon.2021.e06191 Text en © 2021 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Hegazi, Mohamed Osman
Al-Dossari, Yasser
Al-Yahy, Abdullah
Al-Sumari, Abdulaziz
Hilal, Anwer
Preprocessing Arabic text on social media
title Preprocessing Arabic text on social media
title_full Preprocessing Arabic text on social media
title_fullStr Preprocessing Arabic text on social media
title_full_unstemmed Preprocessing Arabic text on social media
title_short Preprocessing Arabic text on social media
title_sort preprocessing arabic text on social media
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7895730/
https://www.ncbi.nlm.nih.gov/pubmed/33644469
http://dx.doi.org/10.1016/j.heliyon.2021.e06191
work_keys_str_mv AT hegazimohamedosman preprocessingarabictextonsocialmedia
AT aldossariyasser preprocessingarabictextonsocialmedia
AT alyahyabdullah preprocessingarabictextonsocialmedia
AT alsumariabdulaziz preprocessingarabictextonsocialmedia
AT hilalanwer preprocessingarabictextonsocialmedia