Cargando…

Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for iden...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Jingcheng, Preston, Sharice, Sun, Hanxiao, Shegog, Ross, Cunningham, Rachel, Boom, Julie, Savas, Lara, Amith, Muhammad, Tao, Cui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8380585/
https://www.ncbi.nlm.nih.gov/pubmed/34383667
http://dx.doi.org/10.2196/26478
_version_ 1783741226559733760
author Du, Jingcheng
Preston, Sharice
Sun, Hanxiao
Shegog, Ross
Cunningham, Rachel
Boom, Julie
Savas, Lara
Amith, Muhammad
Tao, Cui
author_facet Du, Jingcheng
Preston, Sharice
Sun, Hanxiao
Shegog, Ross
Cunningham, Rachel
Boom, Julie
Savas, Lara
Amith, Muhammad
Tao, Cui
author_sort Du, Jingcheng
collection PubMed
description BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)–based methods. METHODS: Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS: A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%). CONCLUSIONS: ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion.
format Online
Article
Text
id pubmed-8380585
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-83805852021-09-02 Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions Du, Jingcheng Preston, Sharice Sun, Hanxiao Shegog, Ross Cunningham, Rachel Boom, Julie Savas, Lara Amith, Muhammad Tao, Cui J Med Internet Res Original Paper BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)–based methods. METHODS: Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS: A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%). CONCLUSIONS: ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion. JMIR Publications 2021-08-05 /pmc/articles/PMC8380585/ /pubmed/34383667 http://dx.doi.org/10.2196/26478 Text en ©Jingcheng Du, Sharice Preston, Hanxiao Sun, Ross Shegog, Rachel Cunningham, Julie Boom, Lara Savas, Muhammad Amith, Cui Tao. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.08.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Du, Jingcheng
Preston, Sharice
Sun, Hanxiao
Shegog, Ross
Cunningham, Rachel
Boom, Julie
Savas, Lara
Amith, Muhammad
Tao, Cui
Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title_full Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title_fullStr Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title_full_unstemmed Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title_short Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
title_sort using machine learning–based approaches for the detection and classification of human papillomavirus vaccine misinformation: infodemiology study of reddit discussions
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8380585/
https://www.ncbi.nlm.nih.gov/pubmed/34383667
http://dx.doi.org/10.2196/26478
work_keys_str_mv AT dujingcheng usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT prestonsharice usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT sunhanxiao usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT shegogross usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT cunninghamrachel usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT boomjulie usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT savaslara usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT amithmuhammad usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions
AT taocui usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions