Cargando…

Sentiment analysis in tweets: an assessment study from classical to modern word representation models

With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent ch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barreto, Sérgio, Moura, Ricardo, Carvalho, Jonnathan, Paes, Aline, Plastino, Alexandre
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9664439/ https://www.ncbi.nlm.nih.gov/pubmed/36406157 http://dx.doi.org/10.1007/s10618-022-00853-0

_version_	1784831100888023040
author	Barreto, Sérgio Moura, Ricardo Carvalho, Jonnathan Paes, Aline Plastino, Alexandre
author_facet	Barreto, Sérgio Moura, Ricardo Carvalho, Jonnathan Paes, Aline Plastino, Alexandre
author_sort	Barreto, Sérgio
collection	PubMed
description	With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted different types of word representation models to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modeling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfills an assessment of existing neural language models in distinguishing the sentiment expressed in tweets, by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also adapted based on the masked language model task, using a plethora of strategies.
format	Online Article Text
id	pubmed-9664439
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-96644392022-11-14 Sentiment analysis in tweets: an assessment study from classical to modern word representation models Barreto, Sérgio Moura, Ricardo Carvalho, Jonnathan Paes, Aline Plastino, Alexandre Data Min Knowl Discov Article With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted different types of word representation models to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modeling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfills an assessment of existing neural language models in distinguishing the sentiment expressed in tweets, by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also adapted based on the masked language model task, using a plethora of strategies. Springer US 2022-11-15 2023 /pmc/articles/PMC9664439/ /pubmed/36406157 http://dx.doi.org/10.1007/s10618-022-00853-0 Text en © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Barreto, Sérgio Moura, Ricardo Carvalho, Jonnathan Paes, Aline Plastino, Alexandre Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title	Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title_full	Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title_fullStr	Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title_full_unstemmed	Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title_short	Sentiment analysis in tweets: an assessment study from classical to modern word representation models
title_sort	sentiment analysis in tweets: an assessment study from classical to modern word representation models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9664439/ https://www.ncbi.nlm.nih.gov/pubmed/36406157 http://dx.doi.org/10.1007/s10618-022-00853-0
work_keys_str_mv	AT barretosergio sentimentanalysisintweetsanassessmentstudyfromclassicaltomodernwordrepresentationmodels AT mouraricardo sentimentanalysisintweetsanassessmentstudyfromclassicaltomodernwordrepresentationmodels AT carvalhojonnathan sentimentanalysisintweetsanassessmentstudyfromclassicaltomodernwordrepresentationmodels AT paesaline sentimentanalysisintweetsanassessmentstudyfromclassicaltomodernwordrepresentationmodels AT plastinoalexandre sentimentanalysisintweetsanassessmentstudyfromclassicaltomodernwordrepresentationmodels

Sentiment analysis in tweets: an assessment study from classical to modern word representation models

Ejemplares similares