Cargando…
An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic
Emotion recognition is a crucial task in Natural Language Processing (NLP) that enables machines to comprehend the feelings conveyed in the text. The task involves detecting and recognizing various human emotions like anger, fear, joy, and sadness. The applications of emotion recognition are diverse...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654532/ https://www.ncbi.nlm.nih.gov/pubmed/38020433 http://dx.doi.org/10.1016/j.dib.2023.109745 |
_version_ | 1785136644777574400 |
---|---|
author | Althobaiti, Maha Jarallah |
author_facet | Althobaiti, Maha Jarallah |
author_sort | Althobaiti, Maha Jarallah |
collection | PubMed |
description | Emotion recognition is a crucial task in Natural Language Processing (NLP) that enables machines to comprehend the feelings conveyed in the text. The task involves detecting and recognizing various human emotions like anger, fear, joy, and sadness. The applications of emotion recognition are diverse, including mental health diagnosis, student support, and the detection of online suspicious behavior. Despite the substantial amount of literature available on emotion recognition in various languages, Arabic emotion recognition has received relatively little attention, leading to a scarcity of emotion-annotated corpora. This article presents the ArPanEmo dataset, a novel dataset for fine-grained emotion recognition of online posts in Arabic. The dataset comprises 11,128 online posts manually labeled for ten emotion categories or neutral, with Fleiss' kappa of 0.71. It is unique in that it focuses on the Saudi dialect and addresses topics related to the COVID-19 pandemic, making it the first and largest of its kind. Python's packages were utilized to collect online posts related to the COVID-19 pandemic from three sources: Twitter, YouTube, and online newspaper comments between March 2020 and March 2022. Upon collection of the online posts, each one underwent a semi-automatic classification process using a lexicon of emotion-related terms to determine whether it belonged to the neutral or emotion category. Subsequently, manual labeling was conducted to further categorize the emotional data into fine-grained emotion categories. We anticipate that the ArPanEmo dataset will enrich Arabic NLP resources and help in the development of machine learning and deep learning tools to identify emotions in a given text. It will also contribute to developing systems that monitor online suspicious behaviors or mental health disorders. The final dataset is formatted in CSV, consisting of three columns: the number of the post, the post's text, and the corresponding emotion label. This format facilitates incorporating and utilizing the dataset in any machine learning research. |
format | Online Article Text |
id | pubmed-10654532 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-106545322023-10-31 An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic Althobaiti, Maha Jarallah Data Brief Data Article Emotion recognition is a crucial task in Natural Language Processing (NLP) that enables machines to comprehend the feelings conveyed in the text. The task involves detecting and recognizing various human emotions like anger, fear, joy, and sadness. The applications of emotion recognition are diverse, including mental health diagnosis, student support, and the detection of online suspicious behavior. Despite the substantial amount of literature available on emotion recognition in various languages, Arabic emotion recognition has received relatively little attention, leading to a scarcity of emotion-annotated corpora. This article presents the ArPanEmo dataset, a novel dataset for fine-grained emotion recognition of online posts in Arabic. The dataset comprises 11,128 online posts manually labeled for ten emotion categories or neutral, with Fleiss' kappa of 0.71. It is unique in that it focuses on the Saudi dialect and addresses topics related to the COVID-19 pandemic, making it the first and largest of its kind. Python's packages were utilized to collect online posts related to the COVID-19 pandemic from three sources: Twitter, YouTube, and online newspaper comments between March 2020 and March 2022. Upon collection of the online posts, each one underwent a semi-automatic classification process using a lexicon of emotion-related terms to determine whether it belonged to the neutral or emotion category. Subsequently, manual labeling was conducted to further categorize the emotional data into fine-grained emotion categories. We anticipate that the ArPanEmo dataset will enrich Arabic NLP resources and help in the development of machine learning and deep learning tools to identify emotions in a given text. It will also contribute to developing systems that monitor online suspicious behaviors or mental health disorders. The final dataset is formatted in CSV, consisting of three columns: the number of the post, the post's text, and the corresponding emotion label. This format facilitates incorporating and utilizing the dataset in any machine learning research. Elsevier 2023-10-31 /pmc/articles/PMC10654532/ /pubmed/38020433 http://dx.doi.org/10.1016/j.dib.2023.109745 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Althobaiti, Maha Jarallah An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title | An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title_full | An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title_fullStr | An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title_full_unstemmed | An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title_short | An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic |
title_sort | open-source dataset for arabic fine-grained emotion recognition of online content amid covid-19 pandemic |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654532/ https://www.ncbi.nlm.nih.gov/pubmed/38020433 http://dx.doi.org/10.1016/j.dib.2023.109745 |
work_keys_str_mv | AT althobaitimahajarallah anopensourcedatasetforarabicfinegrainedemotionrecognitionofonlinecontentamidcovid19pandemic AT althobaitimahajarallah opensourcedatasetforarabicfinegrainedemotionrecognitionofonlinecontentamidcovid19pandemic |