Cargando…

Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study

BACKGROUND: Reddit is a popular social media platform that has faced scrutiny for inflammatory language against those with obesity, yet there has been no comprehensive analysis of its obesity-related content. OBJECTIVE: We aimed to quantify the presence of 4 types of obesity-related content on Reddi...

Descripción completa

Detalles Bibliográficos
Autores principales: Pollack, Catherine C, Emond, Jennifer A, O'Malley, A James, Byrd, Anna, Green, Peter, Miller, Katherine E, Vosoughi, Soroush, Gilbert-Diamond, Diane, Onega, Tracy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9840103/
https://www.ncbi.nlm.nih.gov/pubmed/36583929
http://dx.doi.org/10.2196/36729
_version_ 1784869575067697152
author Pollack, Catherine C
Emond, Jennifer A
O'Malley, A James
Byrd, Anna
Green, Peter
Miller, Katherine E
Vosoughi, Soroush
Gilbert-Diamond, Diane
Onega, Tracy
author_facet Pollack, Catherine C
Emond, Jennifer A
O'Malley, A James
Byrd, Anna
Green, Peter
Miller, Katherine E
Vosoughi, Soroush
Gilbert-Diamond, Diane
Onega, Tracy
author_sort Pollack, Catherine C
collection PubMed
description BACKGROUND: Reddit is a popular social media platform that has faced scrutiny for inflammatory language against those with obesity, yet there has been no comprehensive analysis of its obesity-related content. OBJECTIVE: We aimed to quantify the presence of 4 types of obesity-related content on Reddit (misinformation, facts, stigma, and positivity) and identify psycholinguistic features that may be enriched within each one. METHODS: All sentences (N=764,179) containing “obese” or “obesity” from top-level comments (n=689,447) made on non–age-restricted subreddits (ie, smaller communities within Reddit) between 2011 and 2019 that contained one of a series of keywords were evaluated. Four types of common natural language processing features were extracted: bigram term frequency–inverse document frequency, word embeddings derived from Bidirectional Encoder Representations from Transformers, sentiment from the Valence Aware Dictionary for Sentiment Reasoning, and psycholinguistic features from the Linguistic Inquiry and Word Count Program. These features were used to train an Extreme Gradient Boosting machine learning classifier to label each sentence as 1 of the 4 content categories or other. Two-part hurdle models for semicontinuous data (which use logistic regression to assess the odds of a 0 result and linear regression for continuous data) were used to evaluate whether select psycholinguistic features presented differently in misinformation (compared with facts) or stigma (compared with positivity). RESULTS: After removing ambiguous sentences, 0.47% (3610/764,179) of the sentences were labeled as misinformation, 1.88% (14,366/764,179) were labeled as stigma, 1.94% (14,799/764,179) were labeled as positivity, and 8.93% (68,276/764,179) were labeled as facts. Each category had markers that distinguished it from other categories within the data as well as an external corpus. For example, misinformation had a higher average percent of negations (β=3.71, 95% CI 3.53-3.90; P<.001) but a lower average number of words >6 letters (β=−1.47, 95% CI −1.85 to −1.10; P<.001) relative to facts. Stigma had a higher proportion of swear words (β=1.83, 95% CI 1.62-2.04; P<.001) but a lower proportion of first-person singular pronouns (β=−5.30, 95% CI −5.44 to −5.16; P<.001) relative to positivity. CONCLUSIONS: There are distinct psycholinguistic properties between types of obesity-related content on Reddit that can be leveraged to rapidly identify deleterious content with minimal human intervention and provide insights into how the Reddit population perceives patients with obesity. Future work should assess whether these properties are shared across languages and other social media platforms.
format Online
Article
Text
id pubmed-9840103
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-98401032023-01-15 Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study Pollack, Catherine C Emond, Jennifer A O'Malley, A James Byrd, Anna Green, Peter Miller, Katherine E Vosoughi, Soroush Gilbert-Diamond, Diane Onega, Tracy J Med Internet Res Original Paper BACKGROUND: Reddit is a popular social media platform that has faced scrutiny for inflammatory language against those with obesity, yet there has been no comprehensive analysis of its obesity-related content. OBJECTIVE: We aimed to quantify the presence of 4 types of obesity-related content on Reddit (misinformation, facts, stigma, and positivity) and identify psycholinguistic features that may be enriched within each one. METHODS: All sentences (N=764,179) containing “obese” or “obesity” from top-level comments (n=689,447) made on non–age-restricted subreddits (ie, smaller communities within Reddit) between 2011 and 2019 that contained one of a series of keywords were evaluated. Four types of common natural language processing features were extracted: bigram term frequency–inverse document frequency, word embeddings derived from Bidirectional Encoder Representations from Transformers, sentiment from the Valence Aware Dictionary for Sentiment Reasoning, and psycholinguistic features from the Linguistic Inquiry and Word Count Program. These features were used to train an Extreme Gradient Boosting machine learning classifier to label each sentence as 1 of the 4 content categories or other. Two-part hurdle models for semicontinuous data (which use logistic regression to assess the odds of a 0 result and linear regression for continuous data) were used to evaluate whether select psycholinguistic features presented differently in misinformation (compared with facts) or stigma (compared with positivity). RESULTS: After removing ambiguous sentences, 0.47% (3610/764,179) of the sentences were labeled as misinformation, 1.88% (14,366/764,179) were labeled as stigma, 1.94% (14,799/764,179) were labeled as positivity, and 8.93% (68,276/764,179) were labeled as facts. Each category had markers that distinguished it from other categories within the data as well as an external corpus. For example, misinformation had a higher average percent of negations (β=3.71, 95% CI 3.53-3.90; P<.001) but a lower average number of words >6 letters (β=−1.47, 95% CI −1.85 to −1.10; P<.001) relative to facts. Stigma had a higher proportion of swear words (β=1.83, 95% CI 1.62-2.04; P<.001) but a lower proportion of first-person singular pronouns (β=−5.30, 95% CI −5.44 to −5.16; P<.001) relative to positivity. CONCLUSIONS: There are distinct psycholinguistic properties between types of obesity-related content on Reddit that can be leveraged to rapidly identify deleterious content with minimal human intervention and provide insights into how the Reddit population perceives patients with obesity. Future work should assess whether these properties are shared across languages and other social media platforms. JMIR Publications 2022-12-30 /pmc/articles/PMC9840103/ /pubmed/36583929 http://dx.doi.org/10.2196/36729 Text en ©Catherine C Pollack, Jennifer A Emond, A James O'Malley, Anna Byrd, Peter Green, Katherine E Miller, Soroush Vosoughi, Diane Gilbert-Diamond, Tracy Onega. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Pollack, Catherine C
Emond, Jennifer A
O'Malley, A James
Byrd, Anna
Green, Peter
Miller, Katherine E
Vosoughi, Soroush
Gilbert-Diamond, Diane
Onega, Tracy
Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title_full Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title_fullStr Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title_full_unstemmed Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title_short Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study
title_sort characterizing the prevalence of obesity misinformation, factual content, stigma, and positivity on the social media platform reddit between 2011 and 2019: infodemiology study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9840103/
https://www.ncbi.nlm.nih.gov/pubmed/36583929
http://dx.doi.org/10.2196/36729
work_keys_str_mv AT pollackcatherinec characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT emondjennifera characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT omalleyajames characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT byrdanna characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT greenpeter characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT millerkatherinee characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT vosoughisoroush characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT gilbertdiamonddiane characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy
AT onegatracy characterizingtheprevalenceofobesitymisinformationfactualcontentstigmaandpositivityonthesocialmediaplatformredditbetween2011and2019infodemiologystudy