Cargando…

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis

The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analy...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rustam, Furqan, Khalid, Madiha, Aslam, Waqar, Rupapara, Vaibhav, Mehmood, Arif, Choi, Gyu Sang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906356/ https://www.ncbi.nlm.nih.gov/pubmed/33630869 http://dx.doi.org/10.1371/journal.pone.0245909

_version_	1783655271971684352
author	Rustam, Furqan Khalid, Madiha Aslam, Waqar Rupapara, Vaibhav Mehmood, Arif Choi, Gyu Sang
author_facet	Rustam, Furqan Khalid, Madiha Aslam, Waqar Rupapara, Vaibhav Mehmood, Arif Choi, Gyu Sang
author_sort	Rustam, Furqan
collection	PubMed
description	The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F(1) score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach.
format	Online Article Text
id	pubmed-7906356
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-79063562021-03-03 A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis Rustam, Furqan Khalid, Madiha Aslam, Waqar Rupapara, Vaibhav Mehmood, Arif Choi, Gyu Sang PLoS One Research Article The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F(1) score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach. Public Library of Science 2021-02-25 /pmc/articles/PMC7906356/ /pubmed/33630869 http://dx.doi.org/10.1371/journal.pone.0245909 Text en © 2021 Rustam et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Rustam, Furqan Khalid, Madiha Aslam, Waqar Rupapara, Vaibhav Mehmood, Arif Choi, Gyu Sang A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title	A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title_full	A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title_fullStr	A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title_full_unstemmed	A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title_short	A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis
title_sort	performance comparison of supervised machine learning models for covid-19 tweets sentiment analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7906356/ https://www.ncbi.nlm.nih.gov/pubmed/33630869 http://dx.doi.org/10.1371/journal.pone.0245909
work_keys_str_mv	AT rustamfurqan aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT khalidmadiha aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT aslamwaqar aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT rupaparavaibhav aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT mehmoodarif aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT choigyusang aperformancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT rustamfurqan performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT khalidmadiha performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT aslamwaqar performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT rupaparavaibhav performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT mehmoodarif performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis AT choigyusang performancecomparisonofsupervisedmachinelearningmodelsforcovid19tweetssentimentanalysis

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis

Ejemplares similares