Cargando…

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quali...

Descripción completa

Detalles Bibliográficos
Autores principales: Mozetič, Igor, Grčar, Miha, Smailović, Jasmina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858191/
https://www.ncbi.nlm.nih.gov/pubmed/27149621
http://dx.doi.org/10.1371/journal.pone.0155036
_version_ 1782430767363653632
author Mozetič, Igor
Grčar, Miha
Smailović, Jasmina
author_facet Mozetič, Igor
Grčar, Miha
Smailović, Jasmina
author_sort Mozetič, Igor
collection PubMed
description What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.
format Online
Article
Text
id pubmed-4858191
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48581912016-05-13 Multilingual Twitter Sentiment Classification: The Role of Human Annotators Mozetič, Igor Grčar, Miha Smailović, Jasmina PLoS One Research Article What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered. Public Library of Science 2016-05-05 /pmc/articles/PMC4858191/ /pubmed/27149621 http://dx.doi.org/10.1371/journal.pone.0155036 Text en © 2016 Mozetič et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mozetič, Igor
Grčar, Miha
Smailović, Jasmina
Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title_full Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title_fullStr Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title_full_unstemmed Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title_short Multilingual Twitter Sentiment Classification: The Role of Human Annotators
title_sort multilingual twitter sentiment classification: the role of human annotators
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858191/
https://www.ncbi.nlm.nih.gov/pubmed/27149621
http://dx.doi.org/10.1371/journal.pone.0155036
work_keys_str_mv AT mozeticigor multilingualtwittersentimentclassificationtheroleofhumanannotators
AT grcarmiha multilingualtwittersentimentclassificationtheroleofhumanannotators
AT smailovicjasmina multilingualtwittersentimentclassificationtheroleofhumanannotators