Cargando…

Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model

Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cheatham, Susan, Kummervold, Per E., Parisi, Lorenza, Lanfranchi, Barbara, Croci, Ileana, Comunello, Francesca, Rota, Maria Cristina, Filia, Antonietta, Tozzi, Alberto Eugenio, Rizzo, Caterina, Gesualdo, Francesco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Public Health
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372360/ https://www.ncbi.nlm.nih.gov/pubmed/35968436 http://dx.doi.org/10.3389/fpubh.2022.948880

_version_	1784767363721199616
author	Cheatham, Susan Kummervold, Per E. Parisi, Lorenza Lanfranchi, Barbara Croci, Ileana Comunello, Francesca Rota, Maria Cristina Filia, Antonietta Tozzi, Alberto Eugenio Rizzo, Caterina Gesualdo, Francesco
author_facet	Cheatham, Susan Kummervold, Per E. Parisi, Lorenza Lanfranchi, Barbara Croci, Ileana Comunello, Francesca Rota, Maria Cristina Filia, Antonietta Tozzi, Alberto Eugenio Rizzo, Caterina Gesualdo, Francesco
author_sort	Cheatham, Susan
collection	PubMed
description	Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorized by three independent annotators. After cleaning, the total dataset consisted of 1,736 tweets with 3 categories (promotional, neutral, and discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Running the selected fine-tuned model on just the first test dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second test dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognized a difference in language between the datasets. On the combined test datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second datasets increased the accuracy over the second test dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first test dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined test datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorizing tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximize accuracy.
format	Online Article Text
id	pubmed-9372360
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-93723602022-08-13 Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model Cheatham, Susan Kummervold, Per E. Parisi, Lorenza Lanfranchi, Barbara Croci, Ileana Comunello, Francesca Rota, Maria Cristina Filia, Antonietta Tozzi, Alberto Eugenio Rizzo, Caterina Gesualdo, Francesco Front Public Health Public Health Social media is increasingly being used to express opinions and attitudes toward vaccines. The vaccine stance of social media posts can be classified in almost real-time using machine learning. We describe the use of a Transformer-based machine learning model for analyzing vaccine stance of Italian tweets, and demonstrate the need to address changes over time in vaccine-related language, through periodic model retraining. Vaccine-related tweets were collected through a platform developed for the European Joint Action on Vaccination. Two datasets were collected, the first between November 2019 and June 2020, the second from April to September 2021. The tweets were manually categorized by three independent annotators. After cleaning, the total dataset consisted of 1,736 tweets with 3 categories (promotional, neutral, and discouraging). The manually classified tweets were used to train and test various machine learning models. The model that classified the data most similarly to humans was XLM-Roberta-large, a multilingual version of the Transformer-based model RoBERTa. The model hyper-parameters were tuned and then the model ran five times. The fine-tuned model with the best F-score over the validation dataset was selected. Running the selected fine-tuned model on just the first test dataset resulted in an accuracy of 72.8% (F-score 0.713). Using this model on the second test dataset resulted in a 10% drop in accuracy to 62.1% (F-score 0.617), indicating that the model recognized a difference in language between the datasets. On the combined test datasets the accuracy was 70.1% (F-score 0.689). Retraining the model using data from the first and second datasets increased the accuracy over the second test dataset to 71.3% (F-score 0.713), a 9% improvement from when using just the first dataset for training. The accuracy over the first test dataset remained the same at 72.8% (F-score 0.721). The accuracy over the combined test datasets was then 72.4% (F-score 0.720), a 2% improvement. Through fine-tuning a machine-learning model on task-specific data, the accuracy achieved in categorizing tweets was close to that expected by a single human annotator. Regular training of machine-learning models with recent data is advisable to maximize accuracy. Frontiers Media S.A. 2022-07-29 /pmc/articles/PMC9372360/ /pubmed/35968436 http://dx.doi.org/10.3389/fpubh.2022.948880 Text en Copyright © 2022 Cheatham, Kummervold, Parisi, Lanfranchi, Croci, Comunello, Rota, Filia, Tozzi, Rizzo and Gesualdo. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Public Health Cheatham, Susan Kummervold, Per E. Parisi, Lorenza Lanfranchi, Barbara Croci, Ileana Comunello, Francesca Rota, Maria Cristina Filia, Antonietta Tozzi, Alberto Eugenio Rizzo, Caterina Gesualdo, Francesco Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title	Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title_full	Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title_fullStr	Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title_full_unstemmed	Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title_short	Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model
title_sort	understanding the vaccine stance of italian tweets and addressing language changes through the covid-19 pandemic: development and validation of a machine learning model
topic	Public Health
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372360/ https://www.ncbi.nlm.nih.gov/pubmed/35968436 http://dx.doi.org/10.3389/fpubh.2022.948880
work_keys_str_mv	AT cheathamsusan understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT kummervoldpere understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT parisilorenza understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT lanfranchibarbara understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT crociileana understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT comunellofrancesca understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT rotamariacristina understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT filiaantonietta understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT tozzialbertoeugenio understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT rizzocaterina understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel AT gesualdofrancesco understandingthevaccinestanceofitaliantweetsandaddressinglanguagechangesthroughthecovid19pandemicdevelopmentandvalidationofamachinelearningmodel

Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model

Ejemplares similares