Cargando…

Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus

Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, suc...

Descripción completa

Detalles Bibliográficos
Autores principales: García-Mendoza, Consuelo V., Gambino, Omar J., Villarreal-Cervantes, Miguel G., Calvo, Hiram
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597113/
https://www.ncbi.nlm.nih.gov/pubmed/33286789
http://dx.doi.org/10.3390/e22091020
_version_ 1783602265545768960
author García-Mendoza, Consuelo V.
Gambino, Omar J.
Villarreal-Cervantes, Miguel G.
Calvo, Hiram
author_facet García-Mendoza, Consuelo V.
Gambino, Omar J.
Villarreal-Cervantes, Miguel G.
Calvo, Hiram
author_sort García-Mendoza, Consuelo V.
collection PubMed
description Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, such as BERT. Unfortunately, these techniques require large amounts of data, which, in some cases, is not available. In order to model this situation, challenges, such as the Spanish TASS organized by the Spanish Society for Natural Language Processing (SEPLN), have been proposed, which pose particular difficulties: First, an unwieldy balance in the training and the test set, being this latter more than eight times the size of the training set. Another difficulty is the marked unbalance in the distribution of classes, which is also different between both sets. Finally, there are four different labels, which create the need to adapt current classifications methods for multiclass handling. Traditional machine learning methods, such as Naïve Bayes, Logistic Regression, and Support Vector Machines, achieve modest performance in these conditions, but used as an ensemble it is possible to attain competitive execution. Several strategies to build classifier ensembles have been proposed; this paper proposes estimating an optimal weighting scheme using a Differential Evolution algorithm focused on dealing with particular issues that multiclass classification and unbalanced corpora pose. The ensemble with the proposed optimized weighting scheme is able to improve the classification results on the full test set of the TASS challenge (General corpus), achieving state of the art performance when compared with other works on this task, which make no use of NLP techniques.
format Online
Article
Text
id pubmed-7597113
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75971132020-11-09 Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus García-Mendoza, Consuelo V. Gambino, Omar J. Villarreal-Cervantes, Miguel G. Calvo, Hiram Entropy (Basel) Article Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, such as BERT. Unfortunately, these techniques require large amounts of data, which, in some cases, is not available. In order to model this situation, challenges, such as the Spanish TASS organized by the Spanish Society for Natural Language Processing (SEPLN), have been proposed, which pose particular difficulties: First, an unwieldy balance in the training and the test set, being this latter more than eight times the size of the training set. Another difficulty is the marked unbalance in the distribution of classes, which is also different between both sets. Finally, there are four different labels, which create the need to adapt current classifications methods for multiclass handling. Traditional machine learning methods, such as Naïve Bayes, Logistic Regression, and Support Vector Machines, achieve modest performance in these conditions, but used as an ensemble it is possible to attain competitive execution. Several strategies to build classifier ensembles have been proposed; this paper proposes estimating an optimal weighting scheme using a Differential Evolution algorithm focused on dealing with particular issues that multiclass classification and unbalanced corpora pose. The ensemble with the proposed optimized weighting scheme is able to improve the classification results on the full test set of the TASS challenge (General corpus), achieving state of the art performance when compared with other works on this task, which make no use of NLP techniques. MDPI 2020-09-12 /pmc/articles/PMC7597113/ /pubmed/33286789 http://dx.doi.org/10.3390/e22091020 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
García-Mendoza, Consuelo V.
Gambino, Omar J.
Villarreal-Cervantes, Miguel G.
Calvo, Hiram
Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_full Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_fullStr Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_full_unstemmed Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_short Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus
title_sort evolutionary optimization of ensemble learning to determine sentiment polarity in an unbalanced multiclass corpus
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597113/
https://www.ncbi.nlm.nih.gov/pubmed/33286789
http://dx.doi.org/10.3390/e22091020
work_keys_str_mv AT garciamendozaconsuelov evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT gambinoomarj evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT villarrealcervantesmiguelg evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus
AT calvohiram evolutionaryoptimizationofensemblelearningtodeterminesentimentpolarityinanunbalancedmulticlasscorpus