Cargando…

Big data directed acyclic graph model for real-time COVID-19 twitter stream detection

Every day, large-scale data are continuously generated on social media as streams, such as Twitter, which inform us about all events around the world in real-time. Notably, Twitter is one of the effective platforms to update countries leaders and scientists during the coronavirus (COVID-19) pandemic...

Descripción completa

Detalles Bibliográficos
Autores principales: Amen, Bakhtiar, Faiz, Syahirul, Do, Thanh-Toan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556703/
https://www.ncbi.nlm.nih.gov/pubmed/34744186
http://dx.doi.org/10.1016/j.patcog.2021.108404
_version_ 1784592222209966080
author Amen, Bakhtiar
Faiz, Syahirul
Do, Thanh-Toan
author_facet Amen, Bakhtiar
Faiz, Syahirul
Do, Thanh-Toan
author_sort Amen, Bakhtiar
collection PubMed
description Every day, large-scale data are continuously generated on social media as streams, such as Twitter, which inform us about all events around the world in real-time. Notably, Twitter is one of the effective platforms to update countries leaders and scientists during the coronavirus (COVID-19) pandemic. Other people have also used this platform to post their concerns about the spread of this virus and a rapid increase of death cases globally. The aim of this work is to detect anomalous events associated with COVID-19 from Twitter. To this end, we propose a distributed Directed Acyclic Graph topology framework to aggregate and process large-scale real-time tweets related to COVID-19. The core of our system is a novel lightweight algorithm that can automatically detect anomaly events. In addition, our system can also identify, cluster, and visualize important keywords in tweets. On 18 August 2020, our model detected the highest anomaly since many tweets mentioned the casualties’ updates and the debates on the pandemic that day. We obtained the three most commonly listed terms on Twitter: “covid”, “death”, and “Trump” (21,566, 11,779, and 4761 occurrences, respectively), with the highest TF-IDF score for these terms: “people” (0.63637), “school” (0.5921407) and “virus” (0.57385). From our clustering result, the word “death”, “corona”, and “case” are grouped into one cluster, where the word “pandemic”, “school”, and “president” are grouped as another cluster. These terms were located near each other on vector space so that they were clustered, indicating people’s most concerned topics on Twitter.
format Online
Article
Text
id pubmed-8556703
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-85567032021-11-01 Big data directed acyclic graph model for real-time COVID-19 twitter stream detection Amen, Bakhtiar Faiz, Syahirul Do, Thanh-Toan Pattern Recognit Article Every day, large-scale data are continuously generated on social media as streams, such as Twitter, which inform us about all events around the world in real-time. Notably, Twitter is one of the effective platforms to update countries leaders and scientists during the coronavirus (COVID-19) pandemic. Other people have also used this platform to post their concerns about the spread of this virus and a rapid increase of death cases globally. The aim of this work is to detect anomalous events associated with COVID-19 from Twitter. To this end, we propose a distributed Directed Acyclic Graph topology framework to aggregate and process large-scale real-time tweets related to COVID-19. The core of our system is a novel lightweight algorithm that can automatically detect anomaly events. In addition, our system can also identify, cluster, and visualize important keywords in tweets. On 18 August 2020, our model detected the highest anomaly since many tweets mentioned the casualties’ updates and the debates on the pandemic that day. We obtained the three most commonly listed terms on Twitter: “covid”, “death”, and “Trump” (21,566, 11,779, and 4761 occurrences, respectively), with the highest TF-IDF score for these terms: “people” (0.63637), “school” (0.5921407) and “virus” (0.57385). From our clustering result, the word “death”, “corona”, and “case” are grouped into one cluster, where the word “pandemic”, “school”, and “president” are grouped as another cluster. These terms were located near each other on vector space so that they were clustered, indicating people’s most concerned topics on Twitter. Elsevier Ltd. 2022-03 2021-10-26 /pmc/articles/PMC8556703/ /pubmed/34744186 http://dx.doi.org/10.1016/j.patcog.2021.108404 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Amen, Bakhtiar
Faiz, Syahirul
Do, Thanh-Toan
Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title_full Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title_fullStr Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title_full_unstemmed Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title_short Big data directed acyclic graph model for real-time COVID-19 twitter stream detection
title_sort big data directed acyclic graph model for real-time covid-19 twitter stream detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556703/
https://www.ncbi.nlm.nih.gov/pubmed/34744186
http://dx.doi.org/10.1016/j.patcog.2021.108404
work_keys_str_mv AT amenbakhtiar bigdatadirectedacyclicgraphmodelforrealtimecovid19twitterstreamdetection
AT faizsyahirul bigdatadirectedacyclicgraphmodelforrealtimecovid19twitterstreamdetection
AT dothanhtoan bigdatadirectedacyclicgraphmodelforrealtimecovid19twitterstreamdetection