Cargando…
Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions
BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for iden...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8380585/ https://www.ncbi.nlm.nih.gov/pubmed/34383667 http://dx.doi.org/10.2196/26478 |
_version_ | 1783741226559733760 |
---|---|
author | Du, Jingcheng Preston, Sharice Sun, Hanxiao Shegog, Ross Cunningham, Rachel Boom, Julie Savas, Lara Amith, Muhammad Tao, Cui |
author_facet | Du, Jingcheng Preston, Sharice Sun, Hanxiao Shegog, Ross Cunningham, Rachel Boom, Julie Savas, Lara Amith, Muhammad Tao, Cui |
author_sort | Du, Jingcheng |
collection | PubMed |
description | BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)–based methods. METHODS: Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS: A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%). CONCLUSIONS: ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion. |
format | Online Article Text |
id | pubmed-8380585 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-83805852021-09-02 Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions Du, Jingcheng Preston, Sharice Sun, Hanxiao Shegog, Ross Cunningham, Rachel Boom, Julie Savas, Lara Amith, Muhammad Tao, Cui J Med Internet Res Original Paper BACKGROUND: The rapid growth of social media as an information channel has made it possible to quickly spread inaccurate or false vaccine information, thus creating obstacles for vaccine promotion. OBJECTIVE: The aim of this study is to develop and evaluate an intelligent automated protocol for identifying and classifying human papillomavirus (HPV) vaccine misinformation on social media using machine learning (ML)–based methods. METHODS: Reddit posts (from 2007 to 2017, N=28,121) that contained keywords related to HPV vaccination were compiled. A random subset (2200/28,121, 7.82%) was manually labeled for misinformation and served as the gold standard corpus for evaluation. A total of 5 ML-based algorithms, including a support vector machine, logistic regression, extremely randomized trees, a convolutional neural network, and a recurrent neural network designed to identify vaccine misinformation, were evaluated for identification performance. Topic modeling was applied to identify the major categories associated with HPV vaccine misinformation. RESULTS: A convolutional neural network model achieved the highest area under the receiver operating characteristic curve of 0.7943. Of the 28,121 Reddit posts, 7207 (25.63%) were classified as vaccine misinformation, with discussions about general safety issues identified as the leading type of misinformed posts (2666/7207, 36.99%). CONCLUSIONS: ML-based approaches are effective in the identification and classification of HPV vaccine misinformation on Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge involved in intelligent automated monitoring and classification of public health misinformation on social media platforms. The timely identification of vaccine misinformation on the internet is the first step in misinformation correction and vaccine promotion. JMIR Publications 2021-08-05 /pmc/articles/PMC8380585/ /pubmed/34383667 http://dx.doi.org/10.2196/26478 Text en ©Jingcheng Du, Sharice Preston, Hanxiao Sun, Ross Shegog, Rachel Cunningham, Julie Boom, Lara Savas, Muhammad Amith, Cui Tao. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.08.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Du, Jingcheng Preston, Sharice Sun, Hanxiao Shegog, Ross Cunningham, Rachel Boom, Julie Savas, Lara Amith, Muhammad Tao, Cui Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title | Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title_full | Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title_fullStr | Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title_full_unstemmed | Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title_short | Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions |
title_sort | using machine learning–based approaches for the detection and classification of human papillomavirus vaccine misinformation: infodemiology study of reddit discussions |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8380585/ https://www.ncbi.nlm.nih.gov/pubmed/34383667 http://dx.doi.org/10.2196/26478 |
work_keys_str_mv | AT dujingcheng usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT prestonsharice usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT sunhanxiao usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT shegogross usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT cunninghamrachel usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT boomjulie usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT savaslara usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT amithmuhammad usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions AT taocui usingmachinelearningbasedapproachesforthedetectionandclassificationofhumanpapillomavirusvaccinemisinformationinfodemiologystudyofredditdiscussions |