Cargando…

COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter

INTRODUCTION: This study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natur...

Descripción completa

Detalles Bibliográficos
Autores principales: Müller, Martin, Salathé, Marcel, Kummervold, Per E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043293/
https://www.ncbi.nlm.nih.gov/pubmed/36998290
http://dx.doi.org/10.3389/frai.2023.1023281
_version_ 1784913113123913728
author Müller, Martin
Salathé, Marcel
Kummervold, Per E.
author_facet Müller, Martin
Salathé, Marcel
Kummervold, Per E.
author_sort Müller, Martin
collection PubMed
description INTRODUCTION: This study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model. METHODS: The study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model. RESULTS: The results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results. DISCUSSION: The study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models.
format Online
Article
Text
id pubmed-10043293
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100432932023-03-29 COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter Müller, Martin Salathé, Marcel Kummervold, Per E. Front Artif Intell Artificial Intelligence INTRODUCTION: This study presents COVID-Twitter-BERT (CT-BERT), a transformer-based model that is pre-trained on a large corpus of COVID-19 related Twitter messages. CT-BERT is specifically designed to be used on COVID-19 content, particularly from social media, and can be utilized for various natural language processing tasks such as classification, question-answering, and chatbots. This paper aims to evaluate the performance of CT-BERT on different classification datasets and compare it with BERT-LARGE, its base model. METHODS: The study utilizes CT-BERT, which is pre-trained on a large corpus of COVID-19 related Twitter messages. The authors evaluated the performance of CT-BERT on five different classification datasets, including one in the target domain. The model's performance is compared to its base model, BERT-LARGE, to measure the marginal improvement. The authors also provide detailed information on the training process and the technical specifications of the model. RESULTS: The results indicate that CT-BERT outperforms BERT-LARGE with a marginal improvement of 10-30% on all five classification datasets. The largest improvements are observed in the target domain. The authors provide detailed performance metrics and discuss the significance of these results. DISCUSSION: The study demonstrates the potential of pre-trained transformer models, such as CT-BERT, for COVID-19 related natural language processing tasks. The results indicate that CT-BERT can improve the classification performance on COVID-19 related content, especially on social media. These findings have important implications for various applications, such as monitoring public sentiment and developing chatbots to provide COVID-19 related information. The study also highlights the importance of using domain-specific pre-trained models for specific natural language processing tasks. Overall, this work provides a valuable contribution to the development of COVID-19 related NLP models. Frontiers Media S.A. 2023-03-14 /pmc/articles/PMC10043293/ /pubmed/36998290 http://dx.doi.org/10.3389/frai.2023.1023281 Text en Copyright © 2023 Müller, Salathé and Kummervold. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Müller, Martin
Salathé, Marcel
Kummervold, Per E.
COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title_full COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title_fullStr COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title_full_unstemmed COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title_short COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter
title_sort covid-twitter-bert: a natural language processing model to analyse covid-19 content on twitter
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043293/
https://www.ncbi.nlm.nih.gov/pubmed/36998290
http://dx.doi.org/10.3389/frai.2023.1023281
work_keys_str_mv AT mullermartin covidtwitterbertanaturallanguageprocessingmodeltoanalysecovid19contentontwitter
AT salathemarcel covidtwitterbertanaturallanguageprocessingmodeltoanalysecovid19contentontwitter
AT kummervoldpere covidtwitterbertanaturallanguageprocessingmodeltoanalysecovid19contentontwitter