Cargando…

On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts

The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a N...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarlis, S., Maglogiannis, I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256387/
http://dx.doi.org/10.1007/978-3-030-49161-1_34
_version_ 1783539896971952128
author Sarlis, S.
Maglogiannis, I.
author_facet Sarlis, S.
Maglogiannis, I.
author_sort Sarlis, S.
collection PubMed
description The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a Natural Language Processing (NLP) system, is text preprocessing, which greatly affects the overall accuracy of the classification algorithms. In this work, two vector space models are created, and a study consisting of a variety of algorithms, is performed on them. The work is based on the IMDb dataset which contains movie reviews along with their associated labels (positive or negative). The goal is to obtain the model with the highest accuracy and the best generalization. To measure how well these models generalize in other domains, several datasets, which are further analyzed later, are used.
format Online
Article
Text
id pubmed-7256387
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72563872020-05-29 On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts Sarlis, S. Maglogiannis, I. Artificial Intelligence Applications and Innovations Article The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a Natural Language Processing (NLP) system, is text preprocessing, which greatly affects the overall accuracy of the classification algorithms. In this work, two vector space models are created, and a study consisting of a variety of algorithms, is performed on them. The work is based on the IMDb dataset which contains movie reviews along with their associated labels (positive or negative). The goal is to obtain the model with the highest accuracy and the best generalization. To measure how well these models generalize in other domains, several datasets, which are further analyzed later, are used. 2020-05-06 /pmc/articles/PMC7256387/ http://dx.doi.org/10.1007/978-3-030-49161-1_34 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Sarlis, S.
Maglogiannis, I.
On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title_full On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title_fullStr On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title_full_unstemmed On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title_short On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
title_sort on the reusability of sentiment analysis datasets in applications with dissimilar contexts
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256387/
http://dx.doi.org/10.1007/978-3-030-49161-1_34
work_keys_str_mv AT sarliss onthereusabilityofsentimentanalysisdatasetsinapplicationswithdissimilarcontexts
AT maglogiannisi onthereusabilityofsentimentanalysisdatasetsinapplicationswithdissimilarcontexts