Cargando…

Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter

BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Morita, Plinio Pelegrini, Zakir Hussain, Irfhana, Kaur, Jasleen, Lotto, Matheus, Butt, Zahid Ahmad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/ https://www.ncbi.nlm.nih.gov/pubmed/37294603 http://dx.doi.org/10.2196/44356

_version_	1785071404974080000
author	Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad
author_facet	Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad
author_sort	Morita, Plinio Pelegrini
collection	PubMed
description	BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics.
format	Online Article Text
id	pubmed-10337356
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-103373562023-07-13 Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad J Med Internet Res Original Paper BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. JMIR Publications 2023-06-09 /pmc/articles/PMC10337356/ /pubmed/37294603 http://dx.doi.org/10.2196/44356 Text en ©Plinio Pelegrini Morita, Irfhana Zakir Hussain, Jasleen Kaur, Matheus Lotto, Zahid Ahmad Butt. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title	Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_full	Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_fullStr	Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_full_unstemmed	Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_short	Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
title_sort	tweeting for health using real-time mining and artificial intelligence–based analytics: design and development of a big data ecosystem for detecting and analyzing misinformation on twitter
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/ https://www.ncbi.nlm.nih.gov/pubmed/37294603 http://dx.doi.org/10.2196/44356
work_keys_str_mv	AT moritapliniopelegrini tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT zakirhussainirfhana tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT kaurjasleen tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT lottomatheus tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT buttzahidahmad tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter

Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter

Ejemplares similares