Cargando…
MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their ro...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415832/ https://www.ncbi.nlm.nih.gov/pubmed/37577409 http://dx.doi.org/10.1016/j.dib.2023.109452 |
_version_ | 1785087633518493696 |
---|---|
author | Mohan E, Syam Sunitha, R. |
author_facet | Mohan E, Syam Sunitha, R. |
author_sort | Mohan E, Syam |
collection | PubMed |
description | Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their role in various applications. Research on sentiment analysis of Indian regional language texts suffers due to the unavailability of available regional language datasets. The curated Malayalam Aspect Based Sentiment Analysis (MABSA) dataset is a labeled dataset for Aspect Based Sentiment Analysis (ABSA) on the Indian regional language Malayalam over the movie review domain. Malayalam movie reviews, an excellent source of text data for ABSA, are collected from an online survey using Google form and manually collecting reviews from three social media platforms: IMDb, Facebook, and YouTube. Nine target aspects were identified, and three annotators annotated the dataset based on the sentiment polarity of each aspect. A total of 4000 reviews were collected, and a total of 7507 aspects are identified in the reviews. Spearman's correlation and Fleiss Kappa Index are used to analyze the annotated dataset's correlation. It has been found that the high correlation between the annotators implies that the MABSA dataset is of gold standard. |
format | Online Article Text |
id | pubmed-10415832 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-104158322023-08-12 MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews Mohan E, Syam Sunitha, R. Data Brief Data Article Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their role in various applications. Research on sentiment analysis of Indian regional language texts suffers due to the unavailability of available regional language datasets. The curated Malayalam Aspect Based Sentiment Analysis (MABSA) dataset is a labeled dataset for Aspect Based Sentiment Analysis (ABSA) on the Indian regional language Malayalam over the movie review domain. Malayalam movie reviews, an excellent source of text data for ABSA, are collected from an online survey using Google form and manually collecting reviews from three social media platforms: IMDb, Facebook, and YouTube. Nine target aspects were identified, and three annotators annotated the dataset based on the sentiment polarity of each aspect. A total of 4000 reviews were collected, and a total of 7507 aspects are identified in the reviews. Spearman's correlation and Fleiss Kappa Index are used to analyze the annotated dataset's correlation. It has been found that the high correlation between the annotators implies that the MABSA dataset is of gold standard. Elsevier 2023-07-26 /pmc/articles/PMC10415832/ /pubmed/37577409 http://dx.doi.org/10.1016/j.dib.2023.109452 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Mohan E, Syam Sunitha, R. MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title | MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title_full | MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title_fullStr | MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title_full_unstemmed | MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title_short | MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews |
title_sort | mabsa: a curated malayalam aspect based sentiment analysis dataset on movie reviews |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415832/ https://www.ncbi.nlm.nih.gov/pubmed/37577409 http://dx.doi.org/10.1016/j.dib.2023.109452 |
work_keys_str_mv | AT mohanesyam mabsaacuratedmalayalamaspectbasedsentimentanalysisdatasetonmoviereviews AT sunithar mabsaacuratedmalayalamaspectbasedsentimentanalysisdatasetonmoviereviews |