Cargando…

Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter

Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Myeong Gyu, Kim, Minjung, Kim, Jae Hyun, Kim, Kyungim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103576/
https://www.ncbi.nlm.nih.gov/pubmed/35564518
http://dx.doi.org/10.3390/ijerph19095126
_version_ 1784707589304483840
author Kim, Myeong Gyu
Kim, Minjung
Kim, Jae Hyun
Kim, Kyungim
author_facet Kim, Myeong Gyu
Kim, Minjung
Kim, Jae Hyun
Kim, Kyungim
author_sort Kim, Myeong Gyu
collection PubMed
description Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as ‘misinformation’ and ‘other.’ We fine-tuned five BERT models (BERT(BASE), BERT(LARGE), BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897–0.911, while BERT(BASE) and BERT(LARGE) achieved accuracy of 0.887–0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter.
format Online
Article
Text
id pubmed-9103576
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91035762022-05-14 Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter Kim, Myeong Gyu Kim, Minjung Kim, Jae Hyun Kim, Kyungim Int J Environ Res Public Health Article Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as ‘misinformation’ and ‘other.’ We fine-tuned five BERT models (BERT(BASE), BERT(LARGE), BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897–0.911, while BERT(BASE) and BERT(LARGE) achieved accuracy of 0.887–0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter. MDPI 2022-04-22 /pmc/articles/PMC9103576/ /pubmed/35564518 http://dx.doi.org/10.3390/ijerph19095126 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Myeong Gyu
Kim, Minjung
Kim, Jae Hyun
Kim, Kyungim
Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title_full Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title_fullStr Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title_full_unstemmed Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title_short Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
title_sort fine-tuning bert models to classify misinformation on garlic and covid-19 on twitter
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103576/
https://www.ncbi.nlm.nih.gov/pubmed/35564518
http://dx.doi.org/10.3390/ijerph19095126
work_keys_str_mv AT kimmyeonggyu finetuningbertmodelstoclassifymisinformationongarlicandcovid19ontwitter
AT kimminjung finetuningbertmodelstoclassifymisinformationongarlicandcovid19ontwitter
AT kimjaehyun finetuningbertmodelstoclassifymisinformationongarlicandcovid19ontwitter
AT kimkyungim finetuningbertmodelstoclassifymisinformationongarlicandcovid19ontwitter