Cargando…

Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter

BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Morita, Plinio Pelegrini, Zakir Hussain, Irfhana, Kaur, Jasleen, Lotto, Matheus, Butt, Zahid Ahmad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/
https://www.ncbi.nlm.nih.gov/pubmed/37294603
http://dx.doi.org/10.2196/44356
_version_ 1785071404974080000
author Morita, Plinio Pelegrini
Zakir Hussain, Irfhana
Kaur, Jasleen
Lotto, Matheus
Butt, Zahid Ahmad
author_facet Morita, Plinio Pelegrini
Zakir Hussain, Irfhana
Kaur, Jasleen
Lotto, Matheus
Butt, Zahid Ahmad
author_sort Morita, Plinio Pelegrini
collection PubMed
description BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics.
format Online
Article
Text
id pubmed-10337356
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103373562023-07-13 Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad J Med Internet Res Original Paper BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. JMIR Publications 2023-06-09 /pmc/articles/PMC10337356/ /pubmed/37294603 http://dx.doi.org/10.2196/44356 Text en ©Plinio Pelegrini Morita, Irfhana Zakir Hussain, Jasleen Kaur, Matheus Lotto, Zahid Ahmad Butt. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Morita, Plinio Pelegrini
Zakir Hussain, Irfhana
Kaur, Jasleen
Lotto, Matheus
Butt, Zahid Ahmad
Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_full Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_fullStr Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_full_unstemmed Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_short Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_sort tweeting for health using real-time mining and artificial intelligence–based analytics: design and development of a big data ecosystem for detecting and analyzing misinformation on twitter
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/
https://www.ncbi.nlm.nih.gov/pubmed/37294603
http://dx.doi.org/10.2196/44356
work_keys_str_mv AT moritapliniopelegrini tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter
AT zakirhussainirfhana tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter
AT kaurjasleen tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter
AT lottomatheus tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter
AT buttzahidahmad tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter