Cargando…
On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts
The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a N...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256387/ http://dx.doi.org/10.1007/978-3-030-49161-1_34 |
_version_ | 1783539896971952128 |
---|---|
author | Sarlis, S. Maglogiannis, I. |
author_facet | Sarlis, S. Maglogiannis, I. |
author_sort | Sarlis, S. |
collection | PubMed |
description | The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a Natural Language Processing (NLP) system, is text preprocessing, which greatly affects the overall accuracy of the classification algorithms. In this work, two vector space models are created, and a study consisting of a variety of algorithms, is performed on them. The work is based on the IMDb dataset which contains movie reviews along with their associated labels (positive or negative). The goal is to obtain the model with the highest accuracy and the best generalization. To measure how well these models generalize in other domains, several datasets, which are further analyzed later, are used. |
format | Online Article Text |
id | pubmed-7256387 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72563872020-05-29 On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts Sarlis, S. Maglogiannis, I. Artificial Intelligence Applications and Innovations Article The main goal of this paper is to evaluate the usability of several algorithms on various sentiment-labeled datasets. The process of creating good semantic vector representations for textual data is considered a very demanding task for the research community. The first and most important step of a Natural Language Processing (NLP) system, is text preprocessing, which greatly affects the overall accuracy of the classification algorithms. In this work, two vector space models are created, and a study consisting of a variety of algorithms, is performed on them. The work is based on the IMDb dataset which contains movie reviews along with their associated labels (positive or negative). The goal is to obtain the model with the highest accuracy and the best generalization. To measure how well these models generalize in other domains, several datasets, which are further analyzed later, are used. 2020-05-06 /pmc/articles/PMC7256387/ http://dx.doi.org/10.1007/978-3-030-49161-1_34 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Sarlis, S. Maglogiannis, I. On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title | On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title_full | On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title_fullStr | On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title_full_unstemmed | On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title_short | On the Reusability of Sentiment Analysis Datasets in Applications with Dissimilar Contexts |
title_sort | on the reusability of sentiment analysis datasets in applications with dissimilar contexts |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256387/ http://dx.doi.org/10.1007/978-3-030-49161-1_34 |
work_keys_str_mv | AT sarliss onthereusabilityofsentimentanalysisdatasetsinapplicationswithdissimilarcontexts AT maglogiannisi onthereusabilityofsentimentanalysisdatasetsinapplicationswithdissimilarcontexts |