Cargando…

An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach

BACKGROUND: As the COVID-19 pandemic progressed, disinformation, fake news, and conspiracy theories spread through many parts of society. However, the disinformation spreading through social media is, according to the literature, one of the causes of increased COVID-19 vaccine hesitancy. In this con...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sauvayre, Romy, Vernier, Jessica, Chauvière, Cédric
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116457/ https://www.ncbi.nlm.nih.gov/pubmed/35512274 http://dx.doi.org/10.2196/37831

_version_	1784710116368449536
author	Sauvayre, Romy Vernier, Jessica Chauvière, Cédric
author_facet	Sauvayre, Romy Vernier, Jessica Chauvière, Cédric
author_sort	Sauvayre, Romy
collection	PubMed
description	BACKGROUND: As the COVID-19 pandemic progressed, disinformation, fake news, and conspiracy theories spread through many parts of society. However, the disinformation spreading through social media is, according to the literature, one of the causes of increased COVID-19 vaccine hesitancy. In this context, the analysis of social media posts is particularly important, but the large amount of data exchanged on social media platforms requires specific methods. This is why machine learning and natural language processing models are increasingly applied to social media data. OBJECTIVE: The aim of this study is to examine the capability of the CamemBERT French-language model to faithfully predict the elaborated categories, with the knowledge that tweets about vaccination are often ambiguous, sarcastic, or irrelevant to the studied topic. METHODS: A total of 901,908 unique French-language tweets related to vaccination published between July 12, 2021, and August 11, 2021, were extracted using Twitter’s application programming interface (version 2; Twitter Inc). Approximately 2000 randomly selected tweets were labeled with 2 types of categorizations: (1) arguments for (pros) or against (cons) vaccination (health measures included) and (2) type of content (scientific, political, social, or vaccination status). The CamemBERT model was fine-tuned and tested for the classification of French-language tweets. The model’s performance was assessed by computing the F1-score, and confusion matrices were obtained. RESULTS: The accuracy of the applied machine learning reached up to 70.6% for the first classification (pro and con tweets) and up to 90% for the second classification (scientific and political tweets). Furthermore, a tweet was 1.86 times more likely to be incorrectly classified by the model if it contained fewer than 170 characters (odds ratio 1.86; 95% CI 1.20-2.86). CONCLUSIONS: The accuracy of the model is affected by the classification chosen and the topic of the message examined. When the vaccine debate is jostled by contested political decisions, tweet content becomes so heterogeneous that the accuracy of the model drops for less differentiated classes. However, our tests showed that it is possible to improve the accuracy by selecting tweets using a new method based on tweet length.
format	Online Article Text
id	pubmed-9116457
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-91164572022-05-19 An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach Sauvayre, Romy Vernier, Jessica Chauvière, Cédric JMIR Med Inform Original Paper BACKGROUND: As the COVID-19 pandemic progressed, disinformation, fake news, and conspiracy theories spread through many parts of society. However, the disinformation spreading through social media is, according to the literature, one of the causes of increased COVID-19 vaccine hesitancy. In this context, the analysis of social media posts is particularly important, but the large amount of data exchanged on social media platforms requires specific methods. This is why machine learning and natural language processing models are increasingly applied to social media data. OBJECTIVE: The aim of this study is to examine the capability of the CamemBERT French-language model to faithfully predict the elaborated categories, with the knowledge that tweets about vaccination are often ambiguous, sarcastic, or irrelevant to the studied topic. METHODS: A total of 901,908 unique French-language tweets related to vaccination published between July 12, 2021, and August 11, 2021, were extracted using Twitter’s application programming interface (version 2; Twitter Inc). Approximately 2000 randomly selected tweets were labeled with 2 types of categorizations: (1) arguments for (pros) or against (cons) vaccination (health measures included) and (2) type of content (scientific, political, social, or vaccination status). The CamemBERT model was fine-tuned and tested for the classification of French-language tweets. The model’s performance was assessed by computing the F1-score, and confusion matrices were obtained. RESULTS: The accuracy of the applied machine learning reached up to 70.6% for the first classification (pro and con tweets) and up to 90% for the second classification (scientific and political tweets). Furthermore, a tweet was 1.86 times more likely to be incorrectly classified by the model if it contained fewer than 170 characters (odds ratio 1.86; 95% CI 1.20-2.86). CONCLUSIONS: The accuracy of the model is affected by the classification chosen and the topic of the message examined. When the vaccine debate is jostled by contested political decisions, tweet content becomes so heterogeneous that the accuracy of the model drops for less differentiated classes. However, our tests showed that it is possible to improve the accuracy by selecting tweets using a new method based on tweet length. JMIR Publications 2022-05-17 /pmc/articles/PMC9116457/ /pubmed/35512274 http://dx.doi.org/10.2196/37831 Text en ©Romy Sauvayre, Jessica Vernier, Cédric Chauvière. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.05.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Sauvayre, Romy Vernier, Jessica Chauvière, Cédric An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title	An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title_full	An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title_fullStr	An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title_full_unstemmed	An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title_short	An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach
title_sort	analysis of french-language tweets about covid-19 vaccines: supervised learning approach
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116457/ https://www.ncbi.nlm.nih.gov/pubmed/35512274 http://dx.doi.org/10.2196/37831
work_keys_str_mv	AT sauvayreromy ananalysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach AT vernierjessica ananalysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach AT chauvierecedric ananalysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach AT sauvayreromy analysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach AT vernierjessica analysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach AT chauvierecedric analysisoffrenchlanguagetweetsaboutcovid19vaccinessupervisedlearningapproach

An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach

Ejemplares similares