Cargando…

Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies

The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown,...

Descripción completa

Detalles Bibliográficos
Autores principales: Gourisaria, Mahendra Kumar, Chandra, Satish, Das, Himansu, Patra, Sudhansu Shekhar, Sahni, Manoj, Leon-Castro, Ernesto, Singh, Vijander, Kumar, Sandeep
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141192/
https://www.ncbi.nlm.nih.gov/pubmed/35628018
http://dx.doi.org/10.3390/healthcare10050881
_version_ 1784715284783824896
author Gourisaria, Mahendra Kumar
Chandra, Satish
Das, Himansu
Patra, Sudhansu Shekhar
Sahni, Manoj
Leon-Castro, Ernesto
Singh, Vijander
Kumar, Sandeep
author_facet Gourisaria, Mahendra Kumar
Chandra, Satish
Das, Himansu
Patra, Sudhansu Shekhar
Sahni, Manoj
Leon-Castro, Ernesto
Singh, Vijander
Kumar, Sandeep
author_sort Gourisaria, Mahendra Kumar
collection PubMed
description The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%.
format Online
Article
Text
id pubmed-9141192
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91411922022-05-28 Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies Gourisaria, Mahendra Kumar Chandra, Satish Das, Himansu Patra, Sudhansu Shekhar Sahni, Manoj Leon-Castro, Ernesto Singh, Vijander Kumar, Sandeep Healthcare (Basel) Article The evolution of the coronavirus (COVID-19) disease took a toll on the social, healthcare, economic, and psychological prosperity of human beings. In the past couple of months, many organizations, individuals, and governments have adopted Twitter to convey their sentiments on COVID-19, the lockdown, the pandemic, and hashtags. This paper aims to analyze the psychological reactions and discourse of Twitter users related to COVID-19. In this experiment, Latent Dirichlet Allocation (LDA) has been used for topic modeling. In addition, a Bidirectional Long Short-Term Memory (BiLSTM) model and various classification techniques such as random forest, support vector machine, logistic regression, naive Bayes, decision tree, logistic regression with stochastic gradient descent optimizer, and majority voting classifier have been adapted for analyzing the polarity of sentiment. The effectiveness of the aforesaid approaches along with LDA modeling has been tested, validated, and compared with several benchmark datasets and on a newly generated dataset for analysis. To achieve better results, a dual dataset approach has been incorporated to determine the frequency of positive and negative tweets and word clouds, which helps to identify the most effective model for analyzing the corpora. The experimental result shows that the BiLSTM approach outperforms the other approaches with an accuracy of 96.7%. MDPI 2022-05-10 /pmc/articles/PMC9141192/ /pubmed/35628018 http://dx.doi.org/10.3390/healthcare10050881 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gourisaria, Mahendra Kumar
Chandra, Satish
Das, Himansu
Patra, Sudhansu Shekhar
Sahni, Manoj
Leon-Castro, Ernesto
Singh, Vijander
Kumar, Sandeep
Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title_full Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title_fullStr Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title_full_unstemmed Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title_short Semantic Analysis and Topic Modelling of Web-Scrapped COVID-19 Tweet Corpora through Data Mining Methodologies
title_sort semantic analysis and topic modelling of web-scrapped covid-19 tweet corpora through data mining methodologies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141192/
https://www.ncbi.nlm.nih.gov/pubmed/35628018
http://dx.doi.org/10.3390/healthcare10050881
work_keys_str_mv AT gourisariamahendrakumar semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT chandrasatish semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT dashimansu semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT patrasudhansushekhar semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT sahnimanoj semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT leoncastroernesto semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT singhvijander semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies
AT kumarsandeep semanticanalysisandtopicmodellingofwebscrappedcovid19tweetcorporathroughdataminingmethodologies