Cargando…

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

BACKGROUND: Studying COVID-19 misinformation on Twitter presents methodological challenges. A computational approach can analyze large data sets, but it is limited when interpreting context. A qualitative approach allows for a deeper analysis of content, but it is labor-intensive and feasible only f...

Descripción completa

Detalles Bibliográficos
Autores principales: Isip Tan, Iris Thiele, Cleofas, Jerome, Solano, Geoffrey, Pillejera, Jeanne Genevive, Catapang, Jasper Kyle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337476/
https://www.ncbi.nlm.nih.gov/pubmed/37220196
http://dx.doi.org/10.2196/41134
_version_ 1785071433549873152
author Isip Tan, Iris Thiele
Cleofas, Jerome
Solano, Geoffrey
Pillejera, Jeanne Genevive
Catapang, Jasper Kyle
author_facet Isip Tan, Iris Thiele
Cleofas, Jerome
Solano, Geoffrey
Pillejera, Jeanne Genevive
Catapang, Jasper Kyle
author_sort Isip Tan, Iris Thiele
collection PubMed
description BACKGROUND: Studying COVID-19 misinformation on Twitter presents methodological challenges. A computational approach can analyze large data sets, but it is limited when interpreting context. A qualitative approach allows for a deeper analysis of content, but it is labor-intensive and feasible only for smaller data sets. OBJECTIVE: We aimed to identify and characterize tweets containing COVID-19 misinformation. METHODS: Tweets geolocated to the Philippines (January 1 to March 21, 2020) containing the words coronavirus, covid, and ncov were mined using the GetOldTweets3 Python library. This primary corpus (N=12,631) was subjected to biterm topic modeling. Key informant interviews were conducted to elicit examples of COVID-19 misinformation and determine keywords. Using NVivo (QSR International) and a combination of word frequency and text search using key informant interview keywords, subcorpus A (n=5881) was constituted and manually coded to identify misinformation. Constant comparative, iterative, and consensual analyses were used to further characterize these tweets. Tweets containing key informant interview keywords were extracted from the primary corpus and processed to constitute subcorpus B (n=4634), of which 506 tweets were manually labeled as misinformation. This training set was subjected to natural language processing to identify tweets with misinformation in the primary corpus. These tweets were further manually coded to confirm labeling. RESULTS: Biterm topic modeling of the primary corpus revealed the following topics: uncertainty, lawmaker’s response, safety measures, testing, loved ones, health standards, panic buying, tragedies other than COVID-19, economy, COVID-19 statistics, precautions, health measures, international issues, adherence to guidelines, and frontliners. These were categorized into 4 major topics: nature of COVID-19, contexts and consequences, people and agents of COVID-19, and COVID-19 prevention and management. Manual coding of subcorpus A identified 398 tweets with misinformation in the following formats: misleading content (n=179), satire and/or parody (n=77), false connection (n=53), conspiracy (n=47), and false context (n=42). The discursive strategies identified were humor (n=109), fear mongering (n=67), anger and disgust (n=59), political commentary (n=59), performing credibility (n=45), overpositivity (n=32), and marketing (n=27). Natural language processing identified 165 tweets with misinformation. However, a manual review showed that 69.7% (115/165) of tweets did not contain misinformation. CONCLUSIONS: An interdisciplinary approach was used to identify tweets with COVID-19 misinformation. Natural language processing mislabeled tweets, likely due to tweets written in Filipino or a combination of the Filipino and English languages. Identifying the formats and discursive strategies of tweets with misinformation required iterative, manual, and emergent coding by human coders with experiential and cultural knowledge of Twitter. An interdisciplinary team composed of experts in health, health informatics, social science, and computer science combined computational and qualitative methods to gain a better understanding of COVID-19 misinformation on Twitter.
format Online
Article
Text
id pubmed-10337476
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103374762023-07-13 Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study Isip Tan, Iris Thiele Cleofas, Jerome Solano, Geoffrey Pillejera, Jeanne Genevive Catapang, Jasper Kyle JMIR Form Res Original Paper BACKGROUND: Studying COVID-19 misinformation on Twitter presents methodological challenges. A computational approach can analyze large data sets, but it is limited when interpreting context. A qualitative approach allows for a deeper analysis of content, but it is labor-intensive and feasible only for smaller data sets. OBJECTIVE: We aimed to identify and characterize tweets containing COVID-19 misinformation. METHODS: Tweets geolocated to the Philippines (January 1 to March 21, 2020) containing the words coronavirus, covid, and ncov were mined using the GetOldTweets3 Python library. This primary corpus (N=12,631) was subjected to biterm topic modeling. Key informant interviews were conducted to elicit examples of COVID-19 misinformation and determine keywords. Using NVivo (QSR International) and a combination of word frequency and text search using key informant interview keywords, subcorpus A (n=5881) was constituted and manually coded to identify misinformation. Constant comparative, iterative, and consensual analyses were used to further characterize these tweets. Tweets containing key informant interview keywords were extracted from the primary corpus and processed to constitute subcorpus B (n=4634), of which 506 tweets were manually labeled as misinformation. This training set was subjected to natural language processing to identify tweets with misinformation in the primary corpus. These tweets were further manually coded to confirm labeling. RESULTS: Biterm topic modeling of the primary corpus revealed the following topics: uncertainty, lawmaker’s response, safety measures, testing, loved ones, health standards, panic buying, tragedies other than COVID-19, economy, COVID-19 statistics, precautions, health measures, international issues, adherence to guidelines, and frontliners. These were categorized into 4 major topics: nature of COVID-19, contexts and consequences, people and agents of COVID-19, and COVID-19 prevention and management. Manual coding of subcorpus A identified 398 tweets with misinformation in the following formats: misleading content (n=179), satire and/or parody (n=77), false connection (n=53), conspiracy (n=47), and false context (n=42). The discursive strategies identified were humor (n=109), fear mongering (n=67), anger and disgust (n=59), political commentary (n=59), performing credibility (n=45), overpositivity (n=32), and marketing (n=27). Natural language processing identified 165 tweets with misinformation. However, a manual review showed that 69.7% (115/165) of tweets did not contain misinformation. CONCLUSIONS: An interdisciplinary approach was used to identify tweets with COVID-19 misinformation. Natural language processing mislabeled tweets, likely due to tweets written in Filipino or a combination of the Filipino and English languages. Identifying the formats and discursive strategies of tweets with misinformation required iterative, manual, and emergent coding by human coders with experiential and cultural knowledge of Twitter. An interdisciplinary team composed of experts in health, health informatics, social science, and computer science combined computational and qualitative methods to gain a better understanding of COVID-19 misinformation on Twitter. JMIR Publications 2023-06-28 /pmc/articles/PMC10337476/ /pubmed/37220196 http://dx.doi.org/10.2196/41134 Text en ©Iris Thiele Isip Tan, Jerome Cleofas, Geoffrey Solano, Jeanne Genevive Pillejera, Jasper Kyle Catapang. Originally published in JMIR Formative Research (https://formative.jmir.org), 28.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Isip Tan, Iris Thiele
Cleofas, Jerome
Solano, Geoffrey
Pillejera, Jeanne Genevive
Catapang, Jasper Kyle
Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title_full Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title_fullStr Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title_full_unstemmed Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title_short Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study
title_sort interdisciplinary approach to identify and characterize covid-19 misinformation on twitter: mixed methods study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337476/
https://www.ncbi.nlm.nih.gov/pubmed/37220196
http://dx.doi.org/10.2196/41134
work_keys_str_mv AT isiptaniristhiele interdisciplinaryapproachtoidentifyandcharacterizecovid19misinformationontwittermixedmethodsstudy
AT cleofasjerome interdisciplinaryapproachtoidentifyandcharacterizecovid19misinformationontwittermixedmethodsstudy
AT solanogeoffrey interdisciplinaryapproachtoidentifyandcharacterizecovid19misinformationontwittermixedmethodsstudy
AT pillejerajeannegenevive interdisciplinaryapproachtoidentifyandcharacterizecovid19misinformationontwittermixedmethodsstudy
AT catapangjasperkyle interdisciplinaryapproachtoidentifyandcharacterizecovid19misinformationontwittermixedmethodsstudy