Cargando…

Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes

BACKGROUND: Many social media studies have explored the ability of thematic structures, such as hashtags and subreddits, to identify information related to a wide variety of mental health disorders. However, studies and models trained on specific themed communities are often difficult to apply to di...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ricard, Benjamin Joseph, Hassanpour, Saeed
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482254/ https://www.ncbi.nlm.nih.gov/pubmed/34524095 http://dx.doi.org/10.2196/27314

_version_	1784576864527843328
author	Ricard, Benjamin Joseph Hassanpour, Saeed
author_facet	Ricard, Benjamin Joseph Hassanpour, Saeed
author_sort	Ricard, Benjamin Joseph
collection	PubMed
description	BACKGROUND: Many social media studies have explored the ability of thematic structures, such as hashtags and subreddits, to identify information related to a wide variety of mental health disorders. However, studies and models trained on specific themed communities are often difficult to apply to different social media platforms and related outcomes. A deep learning framework using thematic structures from Reddit and Twitter can have distinct advantages for studying alcohol abuse, particularly among the youth in the United States. OBJECTIVE: This study proposes a new deep learning pipeline that uses thematic structures to identify alcohol-related content across different platforms. We apply our method on Twitter to determine the association of the prevalence of alcohol-related tweets with alcohol-related outcomes reported from the National Institute of Alcoholism and Alcohol Abuse, Centers for Disease Control Behavioral Risk Factor Surveillance System, county health rankings, and the National Industry Classification System. METHODS: The Bidirectional Encoder Representations From Transformers neural network learned to classify 1,302,524 Reddit posts as either alcohol-related or control subreddits. The trained model identified 24 alcohol-related hashtags from an unlabeled data set of 843,769 random tweets. Querying alcohol-related hashtags identified 25,558,846 alcohol-related tweets, including 790,544 location-specific (geotagged) tweets. We calculated the correlation between the prevalence of alcohol-related tweets and alcohol-related outcomes, controlling for confounding effects of age, sex, income, education, and self-reported race, as recorded by the 2013-2018 American Community Survey. RESULTS: Significant associations were observed: between alcohol-hashtagged tweets and alcohol consumption (P=.01) and heavy drinking (P=.005) but not binge drinking (P=.37), self-reported at the metropolitan-micropolitan statistical area level; between alcohol-hashtagged tweets and self-reported excessive drinking behavior (P=.03) but not motor vehicle fatalities involving alcohol (P=.21); between alcohol-hashtagged tweets and the number of breweries (P<.001), wineries (P<.001), and beer, wine, and liquor stores (P<.001) but not drinking places (P=.23), per capita at the US county and county-equivalent level; and between alcohol-hashtagged tweets and all gallons of ethanol consumed (P<.001), as well as ethanol consumed from wine (P<.001) and liquor (P=.01) sources but not beer (P=.63), at the US state level. CONCLUSIONS: Here, we present a novel natural language processing pipeline developed using Reddit’s alcohol-related subreddits that identify highly specific alcohol-related Twitter hashtags. The prevalence of identified hashtags contains interpretable information about alcohol consumption at both coarse (eg, US state) and fine-grained (eg, metropolitan-micropolitan statistical area level and county) geographical designations. This approach can expand research and deep learning interventions on alcohol abuse and other behavioral health outcomes.
format	Online Article Text
id	pubmed-8482254
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-84822542021-11-24 Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes Ricard, Benjamin Joseph Hassanpour, Saeed J Med Internet Res Original Paper BACKGROUND: Many social media studies have explored the ability of thematic structures, such as hashtags and subreddits, to identify information related to a wide variety of mental health disorders. However, studies and models trained on specific themed communities are often difficult to apply to different social media platforms and related outcomes. A deep learning framework using thematic structures from Reddit and Twitter can have distinct advantages for studying alcohol abuse, particularly among the youth in the United States. OBJECTIVE: This study proposes a new deep learning pipeline that uses thematic structures to identify alcohol-related content across different platforms. We apply our method on Twitter to determine the association of the prevalence of alcohol-related tweets with alcohol-related outcomes reported from the National Institute of Alcoholism and Alcohol Abuse, Centers for Disease Control Behavioral Risk Factor Surveillance System, county health rankings, and the National Industry Classification System. METHODS: The Bidirectional Encoder Representations From Transformers neural network learned to classify 1,302,524 Reddit posts as either alcohol-related or control subreddits. The trained model identified 24 alcohol-related hashtags from an unlabeled data set of 843,769 random tweets. Querying alcohol-related hashtags identified 25,558,846 alcohol-related tweets, including 790,544 location-specific (geotagged) tweets. We calculated the correlation between the prevalence of alcohol-related tweets and alcohol-related outcomes, controlling for confounding effects of age, sex, income, education, and self-reported race, as recorded by the 2013-2018 American Community Survey. RESULTS: Significant associations were observed: between alcohol-hashtagged tweets and alcohol consumption (P=.01) and heavy drinking (P=.005) but not binge drinking (P=.37), self-reported at the metropolitan-micropolitan statistical area level; between alcohol-hashtagged tweets and self-reported excessive drinking behavior (P=.03) but not motor vehicle fatalities involving alcohol (P=.21); between alcohol-hashtagged tweets and the number of breweries (P<.001), wineries (P<.001), and beer, wine, and liquor stores (P<.001) but not drinking places (P=.23), per capita at the US county and county-equivalent level; and between alcohol-hashtagged tweets and all gallons of ethanol consumed (P<.001), as well as ethanol consumed from wine (P<.001) and liquor (P=.01) sources but not beer (P=.63), at the US state level. CONCLUSIONS: Here, we present a novel natural language processing pipeline developed using Reddit’s alcohol-related subreddits that identify highly specific alcohol-related Twitter hashtags. The prevalence of identified hashtags contains interpretable information about alcohol consumption at both coarse (eg, US state) and fine-grained (eg, metropolitan-micropolitan statistical area level and county) geographical designations. This approach can expand research and deep learning interventions on alcohol abuse and other behavioral health outcomes. JMIR Publications 2021-09-15 /pmc/articles/PMC8482254/ /pubmed/34524095 http://dx.doi.org/10.2196/27314 Text en ©Benjamin Joseph Ricard, Saeed Hassanpour. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 15.09.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Ricard, Benjamin Joseph Hassanpour, Saeed Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title	Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title_full	Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title_fullStr	Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title_full_unstemmed	Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title_short	Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes
title_sort	deep learning for identification of alcohol-related content on social media (reddit and twitter): exploratory analysis of alcohol-related outcomes
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482254/ https://www.ncbi.nlm.nih.gov/pubmed/34524095 http://dx.doi.org/10.2196/27314
work_keys_str_mv	AT ricardbenjaminjoseph deeplearningforidentificationofalcoholrelatedcontentonsocialmediaredditandtwitterexploratoryanalysisofalcoholrelatedoutcomes AT hassanpoursaeed deeplearningforidentificationofalcoholrelatedcontentonsocialmediaredditandtwitterexploratoryanalysisofalcoholrelatedoutcomes

Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes

Ejemplares similares