Cargando…

MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews

Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their ro...

Descripción completa

Detalles Bibliográficos
Autores principales: Mohan E, Syam, Sunitha, R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415832/
https://www.ncbi.nlm.nih.gov/pubmed/37577409
http://dx.doi.org/10.1016/j.dib.2023.109452
_version_ 1785087633518493696
author Mohan E, Syam
Sunitha, R.
author_facet Mohan E, Syam
Sunitha, R.
author_sort Mohan E, Syam
collection PubMed
description Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their role in various applications. Research on sentiment analysis of Indian regional language texts suffers due to the unavailability of available regional language datasets. The curated Malayalam Aspect Based Sentiment Analysis (MABSA) dataset is a labeled dataset for Aspect Based Sentiment Analysis (ABSA) on the Indian regional language Malayalam over the movie review domain. Malayalam movie reviews, an excellent source of text data for ABSA, are collected from an online survey using Google form and manually collecting reviews from three social media platforms: IMDb, Facebook, and YouTube. Nine target aspects were identified, and three annotators annotated the dataset based on the sentiment polarity of each aspect. A total of 4000 reviews were collected, and a total of 7507 aspects are identified in the reviews. Spearman's correlation and Fleiss Kappa Index are used to analyze the annotated dataset's correlation. It has been found that the high correlation between the annotators implies that the MABSA dataset is of gold standard.
format Online
Article
Text
id pubmed-10415832
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-104158322023-08-12 MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews Mohan E, Syam Sunitha, R. Data Brief Data Article Regional languages are being used more frequently in online platforms as a result of the expanding use of digital technology. Understanding user opinions on social media platforms, forums, blogs, and other digital platforms that employ Indian regional languages has become significant due to their role in various applications. Research on sentiment analysis of Indian regional language texts suffers due to the unavailability of available regional language datasets. The curated Malayalam Aspect Based Sentiment Analysis (MABSA) dataset is a labeled dataset for Aspect Based Sentiment Analysis (ABSA) on the Indian regional language Malayalam over the movie review domain. Malayalam movie reviews, an excellent source of text data for ABSA, are collected from an online survey using Google form and manually collecting reviews from three social media platforms: IMDb, Facebook, and YouTube. Nine target aspects were identified, and three annotators annotated the dataset based on the sentiment polarity of each aspect. A total of 4000 reviews were collected, and a total of 7507 aspects are identified in the reviews. Spearman's correlation and Fleiss Kappa Index are used to analyze the annotated dataset's correlation. It has been found that the high correlation between the annotators implies that the MABSA dataset is of gold standard. Elsevier 2023-07-26 /pmc/articles/PMC10415832/ /pubmed/37577409 http://dx.doi.org/10.1016/j.dib.2023.109452 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Mohan E, Syam
Sunitha, R.
MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title_full MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title_fullStr MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title_full_unstemmed MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title_short MABSA: A curated Malayalam aspect based sentiment analysis dataset on movie reviews
title_sort mabsa: a curated malayalam aspect based sentiment analysis dataset on movie reviews
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415832/
https://www.ncbi.nlm.nih.gov/pubmed/37577409
http://dx.doi.org/10.1016/j.dib.2023.109452
work_keys_str_mv AT mohanesyam mabsaacuratedmalayalamaspectbasedsentimentanalysisdatasetonmoviereviews
AT sunithar mabsaacuratedmalayalamaspectbasedsentimentanalysisdatasetonmoviereviews