Cargando…

Tweet sentiment quantification: An experimental re-evaluation

Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts ar...

Descripción completa

Detalles Bibliográficos
Autores principales:	Moreo, Alejandro, Sebastiani, Fabrizio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9481048/ https://www.ncbi.nlm.nih.gov/pubmed/36112639 http://dx.doi.org/10.1371/journal.pone.0263449

_version_	1784791176277131264
author	Moreo, Alejandro Sebastiani, Fabrizio
author_facet	Moreo, Alejandro Sebastiani, Fabrizio
author_sort	Moreo, Alejandro
collection	PubMed
description	Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of individual tweets). It is well-known that solving quantification by means of “classify and count” (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus questionable. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.
format	Online Article Text
id	pubmed-9481048
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-94810482022-09-17 Tweet sentiment quantification: An experimental re-evaluation Moreo, Alejandro Sebastiani, Fabrizio PLoS One Research Article Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called “prevalence”) of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of individual tweets). It is well-known that solving quantification by means of “classify and count” (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus questionable. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods. Public Library of Science 2022-09-16 /pmc/articles/PMC9481048/ /pubmed/36112639 http://dx.doi.org/10.1371/journal.pone.0263449 Text en © 2022 Moreo, Sebastiani https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Moreo, Alejandro Sebastiani, Fabrizio Tweet sentiment quantification: An experimental re-evaluation
title	Tweet sentiment quantification: An experimental re-evaluation
title_full	Tweet sentiment quantification: An experimental re-evaluation
title_fullStr	Tweet sentiment quantification: An experimental re-evaluation
title_full_unstemmed	Tweet sentiment quantification: An experimental re-evaluation
title_short	Tweet sentiment quantification: An experimental re-evaluation
title_sort	tweet sentiment quantification: an experimental re-evaluation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9481048/ https://www.ncbi.nlm.nih.gov/pubmed/36112639 http://dx.doi.org/10.1371/journal.pone.0263449
work_keys_str_mv	AT moreoalejandro tweetsentimentquantificationanexperimentalreevaluation AT sebastianifabrizio tweetsentimentquantificationanexperimentalreevaluation

Tweet sentiment quantification: An experimental re-evaluation

Ejemplares similares