Cargando…

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram

BACKGROUND: The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testin...

Descripción completa

Detalles Bibliográficos
Autores principales: Mackey, Tim Ken, Li, Jiawei, Purushothaman, Vidya, Nali, Matthew, Shah, Neal, Bardier, Cortni, Cai, Mingxiang, Liang, Bryan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7451110/
https://www.ncbi.nlm.nih.gov/pubmed/32750006
http://dx.doi.org/10.2196/20794
_version_ 1783574922112532480
author Mackey, Tim Ken
Li, Jiawei
Purushothaman, Vidya
Nali, Matthew
Shah, Neal
Bardier, Cortni
Cai, Mingxiang
Liang, Bryan
author_facet Mackey, Tim Ken
Li, Jiawei
Purushothaman, Vidya
Nali, Matthew
Shah, Neal
Bardier, Cortni
Cai, Mingxiang
Liang, Bryan
author_sort Mackey, Tim Ken
collection PubMed
description BACKGROUND: The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. OBJECTIVE: This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19–related health care products from Twitter and Instagram. METHODS: This study is conducted in two phases beginning with the collection of COVID-19–related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. RESULTS: We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19–related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. CONCLUSIONS: Results from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public.
format Online
Article
Text
id pubmed-7451110
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-74511102020-08-31 Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram Mackey, Tim Ken Li, Jiawei Purushothaman, Vidya Nali, Matthew Shah, Neal Bardier, Cortni Cai, Mingxiang Liang, Bryan JMIR Public Health Surveill Original Paper BACKGROUND: The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. OBJECTIVE: This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19–related health care products from Twitter and Instagram. METHODS: This study is conducted in two phases beginning with the collection of COVID-19–related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. RESULTS: We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19–related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. CONCLUSIONS: Results from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public. JMIR Publications 2020-08-25 /pmc/articles/PMC7451110/ /pubmed/32750006 http://dx.doi.org/10.2196/20794 Text en ©Tim Ken Mackey, Jiawei Li, Vidya Purushothaman, Matthew Nali, Neal Shah, Cortni Bardier, Mingxiang Cai, Bryan Liang. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 25.08.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Mackey, Tim Ken
Li, Jiawei
Purushothaman, Vidya
Nali, Matthew
Shah, Neal
Bardier, Cortni
Cai, Mingxiang
Liang, Bryan
Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title_full Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title_fullStr Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title_full_unstemmed Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title_short Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram
title_sort big data, natural language processing, and deep learning to detect and characterize illicit covid-19 product sales: infoveillance study on twitter and instagram
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7451110/
https://www.ncbi.nlm.nih.gov/pubmed/32750006
http://dx.doi.org/10.2196/20794
work_keys_str_mv AT mackeytimken bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT lijiawei bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT purushothamanvidya bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT nalimatthew bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT shahneal bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT bardiercortni bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT caimingxiang bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram
AT liangbryan bigdatanaturallanguageprocessinganddeeplearningtodetectandcharacterizeillicitcovid19productsalesinfoveillancestudyontwitterandinstagram