Cargando…

An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words

Fake COVID-19 tweets appear as legitimate and appealing to unsuspecting internet users because of lack of prior knowledge of the novel pandemic. Such news could be misleading, counterproductive, unethical, unprofessional, and sometimes, constitute a log in the wheel of global efforts toward flatteni...

Descripción completa

Detalles Bibliográficos
Autores principales: Olaleye, T.O., Arogundade, O.T., Abayomi-Alli, A., Adesemowo, A.K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137711/
http://dx.doi.org/10.1016/B978-0-12-824536-1.00004-6
_version_ 1783695661458259968
author Olaleye, T.O.
Arogundade, O.T.
Abayomi-Alli, A.
Adesemowo, A.K.
author_facet Olaleye, T.O.
Arogundade, O.T.
Abayomi-Alli, A.
Adesemowo, A.K.
author_sort Olaleye, T.O.
collection PubMed
description Fake COVID-19 tweets appear as legitimate and appealing to unsuspecting internet users because of lack of prior knowledge of the novel pandemic. Such news could be misleading, counterproductive, unethical, unprofessional, and sometimes, constitute a log in the wheel of global efforts toward flattening the virus spread curve. Therefore, aside the COVID-19 pandemic, dealing with fake news and myths about the virus constitute an infodemic issue which must be tackled to ensure that only valid information is consumed by the public. Following the research approach, this chapter aims at a predictive analytics of COVID-19 infodemic tweets that generates a classification rule and validates genuine information from verified accredited health institutions/sources. On deployment of classifier Vote ensembles formed by base classifiers SMO, Voted Perceptron, Liblinear, Reptree, and Decision Stump on dataset of tokenized 81,456 Bag of Words which encapsulate 2964 COVID-19 tweet instances and 3169 extracted numeric vector attributes, experimental result shows a novel 99.93% prediction accuracy on 10-fold cross validation while the information gain of each 3169 extracted attributes is ranked to ascertain the most significant COVID-19 tweet-words for the detection system. Other performance metrics including ROC area and Relief-F validates the reliability of the model and returns SMO as the most efficient base classifier. The thrust of the model centered more on the trustworthiness of COVID-19 tweet source than the truthfulness of the tweet which underscores the prominence of verified health institutions as well as it contributes to discourse on inhibition and impact of fake news especially on societal pandemics. The COVID-19 infodemic detection algorithm provides insight into new spin on fake news in the age of social media and era of pandemics.
format Online
Article
Text
id pubmed-8137711
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-81377112021-05-21 An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words Olaleye, T.O. Arogundade, O.T. Abayomi-Alli, A. Adesemowo, A.K. Data Science for COVID-19 Article Fake COVID-19 tweets appear as legitimate and appealing to unsuspecting internet users because of lack of prior knowledge of the novel pandemic. Such news could be misleading, counterproductive, unethical, unprofessional, and sometimes, constitute a log in the wheel of global efforts toward flattening the virus spread curve. Therefore, aside the COVID-19 pandemic, dealing with fake news and myths about the virus constitute an infodemic issue which must be tackled to ensure that only valid information is consumed by the public. Following the research approach, this chapter aims at a predictive analytics of COVID-19 infodemic tweets that generates a classification rule and validates genuine information from verified accredited health institutions/sources. On deployment of classifier Vote ensembles formed by base classifiers SMO, Voted Perceptron, Liblinear, Reptree, and Decision Stump on dataset of tokenized 81,456 Bag of Words which encapsulate 2964 COVID-19 tweet instances and 3169 extracted numeric vector attributes, experimental result shows a novel 99.93% prediction accuracy on 10-fold cross validation while the information gain of each 3169 extracted attributes is ranked to ascertain the most significant COVID-19 tweet-words for the detection system. Other performance metrics including ROC area and Relief-F validates the reliability of the model and returns SMO as the most efficient base classifier. The thrust of the model centered more on the trustworthiness of COVID-19 tweet source than the truthfulness of the tweet which underscores the prominence of verified health institutions as well as it contributes to discourse on inhibition and impact of fake news especially on societal pandemics. The COVID-19 infodemic detection algorithm provides insight into new spin on fake news in the age of social media and era of pandemics. 2021 2021-05-21 /pmc/articles/PMC8137711/ http://dx.doi.org/10.1016/B978-0-12-824536-1.00004-6 Text en Copyright © 2021 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Olaleye, T.O.
Arogundade, O.T.
Abayomi-Alli, A.
Adesemowo, A.K.
An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title_full An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title_fullStr An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title_full_unstemmed An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title_short An ensemble predictive analytics of COVID-19 infodemic tweets using bag of words
title_sort ensemble predictive analytics of covid-19 infodemic tweets using bag of words
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137711/
http://dx.doi.org/10.1016/B978-0-12-824536-1.00004-6
work_keys_str_mv AT olaleyeto anensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT arogundadeot anensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT abayomiallia anensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT adesemowoak anensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT olaleyeto ensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT arogundadeot ensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT abayomiallia ensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords
AT adesemowoak ensemblepredictiveanalyticsofcovid19infodemictweetsusingbagofwords