Cargando…
Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset
In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face mask...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Authors. Published by Elsevier Ltd.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9443617/ https://www.ncbi.nlm.nih.gov/pubmed/36092862 http://dx.doi.org/10.1016/j.eswa.2022.118715 |
_version_ | 1784783025455759360 |
---|---|
author | Qorib, Miftahul Oladunni, Timothy Denis, Max Ososanya, Esther Cotae, Paul |
author_facet | Qorib, Miftahul Oladunni, Timothy Denis, Max Ososanya, Esther Cotae, Paul |
author_sort | Qorib, Miftahul |
collection | PubMed |
description | In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face masking, and social distancing. In early 2021, some countries including the United States began administering COVID-19 vaccines. Vaccination brought a relief to the public; it also generated a lot of debates from anti-vaccine and pro-vaccine groups. The controversy and debate surrounding COVID-19 vaccine influenced the decision of several people in either to accept or reject vaccination. Because of data limitations, social media data, collected through live streaming public tweets using an Application Programming Interface (API) search, is considered a viable and reliable resource to study the opinion of the public on Covid-19 vaccine hesitancy. Thus, this study examines 3 sentiment computation methods (Azure Machine Learning, VADER, and TextBlob) to analyze COVID-19 vaccine hesitancy. Five learning algorithms (Random Forest, Logistics Regression, Decision Tree, LinearSVC, and Naïve Bayes) with different combination of three vectorization methods (Doc2Vec, CountVectorizer, and TF-IDF) were deployed. Vocabulary normalization was threefold; potter stemming, lemmatization, and potter stemming with lemmatization. For each vocabulary normalization strategy, we designed, developed, and evaluated 42 models. The study shows that Covid-19 vaccine hesitancy slowly decreases over time; suggesting that the public gradually feels warm and optimistic about COVID-19 vaccination. Moreover, combining potter stemming and lemmatization increased model performances. Finally, the result of our experiment shows that TextBlob + TF-IDF + LinearSVC has the best performance in classifying public sentiment into positive, neutral, or negative with an accuracy, precision, recall and F1 score of 0.96752, 0.96921, 0.92807 and 0.94702 respectively. It means that the best performance was achieved when using TextBlob sentiment score, with TF-IDF vectorization and LinearSVC classification model. We also found out that combining two vectorizations (CountVectorizer and TF-IDF) decreases model accuracy. |
format | Online Article Text |
id | pubmed-9443617 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | The Authors. Published by Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-94436172022-09-06 Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset Qorib, Miftahul Oladunni, Timothy Denis, Max Ososanya, Esther Cotae, Paul Expert Syst Appl Article In 2019 there was an outbreak of coronavirus pandemic also known as COVID-19. Many scientists believe that the pandemic originated from Wuhan, China, before spreading to other parts of the globe. To reduce the spread of the disease, decision makers encouraged measures such as hand washing, face masking, and social distancing. In early 2021, some countries including the United States began administering COVID-19 vaccines. Vaccination brought a relief to the public; it also generated a lot of debates from anti-vaccine and pro-vaccine groups. The controversy and debate surrounding COVID-19 vaccine influenced the decision of several people in either to accept or reject vaccination. Because of data limitations, social media data, collected through live streaming public tweets using an Application Programming Interface (API) search, is considered a viable and reliable resource to study the opinion of the public on Covid-19 vaccine hesitancy. Thus, this study examines 3 sentiment computation methods (Azure Machine Learning, VADER, and TextBlob) to analyze COVID-19 vaccine hesitancy. Five learning algorithms (Random Forest, Logistics Regression, Decision Tree, LinearSVC, and Naïve Bayes) with different combination of three vectorization methods (Doc2Vec, CountVectorizer, and TF-IDF) were deployed. Vocabulary normalization was threefold; potter stemming, lemmatization, and potter stemming with lemmatization. For each vocabulary normalization strategy, we designed, developed, and evaluated 42 models. The study shows that Covid-19 vaccine hesitancy slowly decreases over time; suggesting that the public gradually feels warm and optimistic about COVID-19 vaccination. Moreover, combining potter stemming and lemmatization increased model performances. Finally, the result of our experiment shows that TextBlob + TF-IDF + LinearSVC has the best performance in classifying public sentiment into positive, neutral, or negative with an accuracy, precision, recall and F1 score of 0.96752, 0.96921, 0.92807 and 0.94702 respectively. It means that the best performance was achieved when using TextBlob sentiment score, with TF-IDF vectorization and LinearSVC classification model. We also found out that combining two vectorizations (CountVectorizer and TF-IDF) decreases model accuracy. The Authors. Published by Elsevier Ltd. 2023-02 2022-09-05 /pmc/articles/PMC9443617/ /pubmed/36092862 http://dx.doi.org/10.1016/j.eswa.2022.118715 Text en © 2022 The Authors Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Qorib, Miftahul Oladunni, Timothy Denis, Max Ososanya, Esther Cotae, Paul Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title | Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title_full | Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title_fullStr | Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title_full_unstemmed | Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title_short | Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset |
title_sort | covid-19 vaccine hesitancy: text mining, sentiment analysis and machine learning on covid-19 vaccination twitter dataset |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9443617/ https://www.ncbi.nlm.nih.gov/pubmed/36092862 http://dx.doi.org/10.1016/j.eswa.2022.118715 |
work_keys_str_mv | AT qoribmiftahul covid19vaccinehesitancytextminingsentimentanalysisandmachinelearningoncovid19vaccinationtwitterdataset AT oladunnitimothy covid19vaccinehesitancytextminingsentimentanalysisandmachinelearningoncovid19vaccinationtwitterdataset AT denismax covid19vaccinehesitancytextminingsentimentanalysisandmachinelearningoncovid19vaccinationtwitterdataset AT ososanyaesther covid19vaccinehesitancytextminingsentimentanalysisandmachinelearningoncovid19vaccinationtwitterdataset AT cotaepaul covid19vaccinehesitancytextminingsentimentanalysisandmachinelearningoncovid19vaccinationtwitterdataset |