Cargando…

Tweet sentiment quantification: An experimental re-evaluation

Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Moreo, Alejandro, Sebastiani, Fabrizio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9481048/
https://www.ncbi.nlm.nih.gov/pubmed/36112639
http://dx.doi.org/10.1371/journal.pone.0263449
_version_ 1784791176277131264
author Moreo, Alejandro
Sebastiani, Fabrizio
author_facet Moreo, Alejandro
Sebastiani, Fabrizio
author_sort Moreo, Alejandro
collection PubMed
description Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of individual tweets). It is well-known that solving quantification by means of “classify and count” (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus questionable. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
format Online
Article
Text
id pubmed-9481048
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94810482022-09-17 Tweet sentiment quantification: An experimental re-evaluation Moreo, Alejandro Sebastiani, Fabrizio PLoS One Research Article Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of individual tweets). It is well-known that solving quantification by means of “classify and count” (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus questionable. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods. Public Library of Science 2022-09-16 /pmc/articles/PMC9481048/ /pubmed/36112639 http://dx.doi.org/10.1371/journal.pone.0263449 Text en © 2022 Moreo, Sebastiani https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Moreo, Alejandro
Sebastiani, Fabrizio
Tweet sentiment quantification: An experimental re-evaluation
title Tweet sentiment quantification: An experimental re-evaluation
title_full Tweet sentiment quantification: An experimental re-evaluation
title_fullStr Tweet sentiment quantification: An experimental re-evaluation
title_full_unstemmed Tweet sentiment quantification: An experimental re-evaluation
title_short Tweet sentiment quantification: An experimental re-evaluation
title_sort tweet sentiment quantification: an experimental re-evaluation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9481048/
https://www.ncbi.nlm.nih.gov/pubmed/36112639
http://dx.doi.org/10.1371/journal.pone.0263449
work_keys_str_mv AT moreoalejandro tweetsentimentquantificationanexperimentalreevaluation
AT sebastianifabrizio tweetsentimentquantificationanexperimentalreevaluation