Cargando…

Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study

BACKGROUND: Dry January, a temporary alcohol abstinence campaign, encourages individuals to reflect on their relationship with alcohol by temporarily abstaining from consumption during the month of January. Though Dry January has become a global phenomenon, there has been limited investigation into...

Descripción completa

Detalles Bibliográficos
Autores principales: Russell, Alex M, Valdez, Danny, Chiang, Shawn C, Montemayor, Ben N, Barry, Adam E, Lin, Hsien-Chang, Massey, Philip M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9719059/
https://www.ncbi.nlm.nih.gov/pubmed/36343184
http://dx.doi.org/10.2196/40160
_version_ 1784843232325140480
author Russell, Alex M
Valdez, Danny
Chiang, Shawn C
Montemayor, Ben N
Barry, Adam E
Lin, Hsien-Chang
Massey, Philip M
author_facet Russell, Alex M
Valdez, Danny
Chiang, Shawn C
Montemayor, Ben N
Barry, Adam E
Lin, Hsien-Chang
Massey, Philip M
author_sort Russell, Alex M
collection PubMed
description BACKGROUND: Dry January, a temporary alcohol abstinence campaign, encourages individuals to reflect on their relationship with alcohol by temporarily abstaining from consumption during the month of January. Though Dry January has become a global phenomenon, there has been limited investigation into Dry January participants’ experiences. One means through which to gain insights into individuals’ Dry January-related experiences is by leveraging large-scale social media data (eg, Twitter chatter) to explore and characterize public discourse concerning Dry January. OBJECTIVE: We sought to answer the following questions: (1) What themes are present within a corpus of tweets about Dry January, and is there consistency in the language used to discuss Dry January across multiple years of tweets (2020-2022)? (2) Do unique themes or patterns emerge in Dry January 2021 tweets after the onset of the COVID-19 pandemic? and (3) What is the association with tweet composition (ie, sentiment and human-authored vs bot-authored) and engagement with Dry January tweets? METHODS: We applied natural language processing techniques to a large sample of tweets (n=222,917) containing the term “dry january” or “dryjanuary” posted from December 15 to February 15 across three separate years of participation (2020-2022). Term frequency inverse document frequency, k-means clustering, and principal component analysis were used for data visualization to identify the optimal number of clusters per year. Once data were visualized, we ran interpretation models to afford within-year (or within-cluster) comparisons. Latent Dirichlet allocation topic modeling was used to examine content within each cluster per given year. Valence Aware Dictionary and Sentiment Reasoner sentiment analysis was used to examine affect per cluster per year. The Botometer automated account check was used to determine average bot score per cluster per year. Last, to assess user engagement with Dry January content, we took the average number of likes and retweets per cluster and ran correlations with other outcome variables of interest. RESULTS: We observed several similar topics per year (eg, Dry January resources, Dry January health benefits, updates related to Dry January progress), suggesting relative consistency in Dry January content over time. Although there was overlap in themes across multiple years of tweets, unique themes related to individuals’ experiences with alcohol during the midst of the COVID-19 global pandemic were detected in the corpus of tweets from 2021. Also, tweet composition was associated with engagement, including number of likes, retweets, and quote-tweets per post. Bot-dominant clusters had fewer likes, retweets, or quote tweets compared with human-authored clusters. CONCLUSIONS: The findings underscore the utility for using large-scale social media, such as discussions on Twitter, to study drinking reduction attempts and to monitor the ongoing dynamic needs of persons contemplating, preparing for, or actively pursuing attempts to quit or cut down on their drinking.
format Online
Article
Text
id pubmed-9719059
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-97190592022-12-04 Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study Russell, Alex M Valdez, Danny Chiang, Shawn C Montemayor, Ben N Barry, Adam E Lin, Hsien-Chang Massey, Philip M J Med Internet Res Original Paper BACKGROUND: Dry January, a temporary alcohol abstinence campaign, encourages individuals to reflect on their relationship with alcohol by temporarily abstaining from consumption during the month of January. Though Dry January has become a global phenomenon, there has been limited investigation into Dry January participants’ experiences. One means through which to gain insights into individuals’ Dry January-related experiences is by leveraging large-scale social media data (eg, Twitter chatter) to explore and characterize public discourse concerning Dry January. OBJECTIVE: We sought to answer the following questions: (1) What themes are present within a corpus of tweets about Dry January, and is there consistency in the language used to discuss Dry January across multiple years of tweets (2020-2022)? (2) Do unique themes or patterns emerge in Dry January 2021 tweets after the onset of the COVID-19 pandemic? and (3) What is the association with tweet composition (ie, sentiment and human-authored vs bot-authored) and engagement with Dry January tweets? METHODS: We applied natural language processing techniques to a large sample of tweets (n=222,917) containing the term “dry january” or “dryjanuary” posted from December 15 to February 15 across three separate years of participation (2020-2022). Term frequency inverse document frequency, k-means clustering, and principal component analysis were used for data visualization to identify the optimal number of clusters per year. Once data were visualized, we ran interpretation models to afford within-year (or within-cluster) comparisons. Latent Dirichlet allocation topic modeling was used to examine content within each cluster per given year. Valence Aware Dictionary and Sentiment Reasoner sentiment analysis was used to examine affect per cluster per year. The Botometer automated account check was used to determine average bot score per cluster per year. Last, to assess user engagement with Dry January content, we took the average number of likes and retweets per cluster and ran correlations with other outcome variables of interest. RESULTS: We observed several similar topics per year (eg, Dry January resources, Dry January health benefits, updates related to Dry January progress), suggesting relative consistency in Dry January content over time. Although there was overlap in themes across multiple years of tweets, unique themes related to individuals’ experiences with alcohol during the midst of the COVID-19 global pandemic were detected in the corpus of tweets from 2021. Also, tweet composition was associated with engagement, including number of likes, retweets, and quote-tweets per post. Bot-dominant clusters had fewer likes, retweets, or quote tweets compared with human-authored clusters. CONCLUSIONS: The findings underscore the utility for using large-scale social media, such as discussions on Twitter, to study drinking reduction attempts and to monitor the ongoing dynamic needs of persons contemplating, preparing for, or actively pursuing attempts to quit or cut down on their drinking. JMIR Publications 2022-11-18 /pmc/articles/PMC9719059/ /pubmed/36343184 http://dx.doi.org/10.2196/40160 Text en ©Alex M Russell, Danny Valdez, Shawn C Chiang, Ben N Montemayor, Adam E Barry, Hsien-Chang Lin, Philip M Massey. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.11.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Russell, Alex M
Valdez, Danny
Chiang, Shawn C
Montemayor, Ben N
Barry, Adam E
Lin, Hsien-Chang
Massey, Philip M
Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title_full Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title_fullStr Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title_full_unstemmed Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title_short Using Natural Language Processing to Explore “Dry January” Posts on Twitter: Longitudinal Infodemiology Study
title_sort using natural language processing to explore “dry january” posts on twitter: longitudinal infodemiology study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9719059/
https://www.ncbi.nlm.nih.gov/pubmed/36343184
http://dx.doi.org/10.2196/40160
work_keys_str_mv AT russellalexm usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT valdezdanny usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT chiangshawnc usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT montemayorbenn usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT barryadame usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT linhsienchang usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy
AT masseyphilipm usingnaturallanguageprocessingtoexploredryjanuarypostsontwitterlongitudinalinfodemiologystudy