Cargando…
Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter
BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/ https://www.ncbi.nlm.nih.gov/pubmed/37294603 http://dx.doi.org/10.2196/44356 |
_version_ | 1785071404974080000 |
---|---|
author | Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad |
author_facet | Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad |
author_sort | Morita, Plinio Pelegrini |
collection | PubMed |
description | BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. |
format | Online Article Text |
id | pubmed-10337356 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-103373562023-07-13 Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad J Med Internet Res Original Paper BACKGROUND: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. OBJECTIVE: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. METHODS: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. RESULTS: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. CONCLUSIONS: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. JMIR Publications 2023-06-09 /pmc/articles/PMC10337356/ /pubmed/37294603 http://dx.doi.org/10.2196/44356 Text en ©Plinio Pelegrini Morita, Irfhana Zakir Hussain, Jasleen Kaur, Matheus Lotto, Zahid Ahmad Butt. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Morita, Plinio Pelegrini Zakir Hussain, Irfhana Kaur, Jasleen Lotto, Matheus Butt, Zahid Ahmad Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title | Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title_full | Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title_fullStr | Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title_full_unstemmed | Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title_short | Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter |
title_sort | tweeting for health using real-time mining and artificial intelligence–based analytics: design and development of a big data ecosystem for detecting and analyzing misinformation on twitter |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337356/ https://www.ncbi.nlm.nih.gov/pubmed/37294603 http://dx.doi.org/10.2196/44356 |
work_keys_str_mv | AT moritapliniopelegrini tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT zakirhussainirfhana tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT kaurjasleen tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT lottomatheus tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter AT buttzahidahmad tweetingforhealthusingrealtimeminingandartificialintelligencebasedanalyticsdesignanddevelopmentofabigdataecosystemfordetectingandanalyzingmisinformationontwitter |