Cargando…

TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets

COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over ke...

Descripción completa

Detalles Bibliográficos
Autores principales: Satu, Md. Shahriare, Khan, Md. Imran, Mahmud, Mufti, Uddin, Shahadat, Summers, Matthew A., Quinn, Julian M.W., Moni, Mohammad Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Published by Elsevier B.V. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8099549/
https://www.ncbi.nlm.nih.gov/pubmed/33972817
http://dx.doi.org/10.1016/j.knosys.2021.107126
_version_ 1783688593939628032
author Satu, Md. Shahriare
Khan, Md. Imran
Mahmud, Mufti
Uddin, Shahadat
Summers, Matthew A.
Quinn, Julian M.W.
Moni, Mohammad Ali
author_facet Satu, Md. Shahriare
Khan, Md. Imran
Mahmud, Mufti
Uddin, Shahadat
Summers, Matthew A.
Quinn, Julian M.W.
Moni, Mohammad Ali
author_sort Satu, Md. Shahriare
collection PubMed
description COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations.
format Online
Article
Text
id pubmed-8099549
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Published by Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-80995492021-05-06 TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets Satu, Md. Shahriare Khan, Md. Imran Mahmud, Mufti Uddin, Shahadat Summers, Matthew A. Quinn, Julian M.W. Moni, Mohammad Ali Knowl Based Syst Article COVID-19, caused by SARS-CoV2 infection, varies greatly in its severity but presents with serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals. Uncertainty remains over key aspects of the virus infectiousness (particularly the newly emerging variants) and the disease has had severe economic impacts globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially influence public opinions and in some cases can exacerbate the widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topic extracting model named TClustVID that analyzes COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed on these datasets which enabled the exploration of the performance of traditional classification and TClustVID. Our analysis found that TClustVID showed higher performance compared to traditional methodologies that are determined by clustering criteria. Finally, we extracted significant topics from the clusters, split them into positive, neutral and negative sentiments, and identified the most frequent topics using the proposed model. This approach is able to rapidly identify commonly prevailing aspects of public opinions and attitudes related to COVID-19 and infection prevention strategies spreading among different populations. Published by Elsevier B.V. 2021-08-17 2021-05-06 /pmc/articles/PMC8099549/ /pubmed/33972817 http://dx.doi.org/10.1016/j.knosys.2021.107126 Text en © 2021 Published by Elsevier B.V. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Satu, Md. Shahriare
Khan, Md. Imran
Mahmud, Mufti
Uddin, Shahadat
Summers, Matthew A.
Quinn, Julian M.W.
Moni, Mohammad Ali
TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title_full TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title_fullStr TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title_full_unstemmed TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title_short TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets
title_sort tclustvid: a novel machine learning classification model to investigate topics and sentiment in covid-19 tweets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8099549/
https://www.ncbi.nlm.nih.gov/pubmed/33972817
http://dx.doi.org/10.1016/j.knosys.2021.107126
work_keys_str_mv AT satumdshahriare tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT khanmdimran tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT mahmudmufti tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT uddinshahadat tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT summersmatthewa tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT quinnjulianmw tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets
AT monimohammadali tclustvidanovelmachinelearningclassificationmodeltoinvestigatetopicsandsentimentincovid19tweets