Cargando…

Content Analysis of Tobacco-related Twitter Posts

OBJECTIVE: We present results of a content analysis of tobacco-related Twitter posts (tweets), focusing on tweets referencing e-cigarettes and hookah. INTRODUCTION: Vast amounts of free, real-time, localizable Twitter data offer new possibilities for public health workers to identify trends and atti...

Descripción completa

Detalles Bibliográficos
Autores principales: Myslín, Mark, Zhu, Shu-Hong, Conway, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692913/
_version_ 1782274685438787584
author Myslín, Mark
Zhu, Shu-Hong
Conway, Michael
author_facet Myslín, Mark
Zhu, Shu-Hong
Conway, Michael
author_sort Myslín, Mark
collection PubMed
description OBJECTIVE: We present results of a content analysis of tobacco-related Twitter posts (tweets), focusing on tweets referencing e-cigarettes and hookah. INTRODUCTION: Vast amounts of free, real-time, localizable Twitter data offer new possibilities for public health workers to identify trends and attitudes that more traditional surveillance methods may not capture, particularly in emerging areas of public health concern where reliable statistical evidence is not readily accessible. Existing applications include tracking public informedness during disease outbreaks [1]. Twitter-based surveillance is particularly suited to new challenges in tobacco control. Hookah and e-cigarettes have surged in popularity, yet regulation and public information remain sparse, despite controversial health effects [2,3]. Ubiquitous online marketing of these products and their popularity among new and younger users make Twitter a key resource for tobacco surveillance. METHODS: We collected 7,300 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012, using ten general keywords such as cig(*) and hookah. Each tweet was manually classified using a tri-axial scheme, capturing genre (firsthand experience, joke, news, …), theme (underage usage, health, social image, …), and sentiment (positive, negative, neutral). Machine-learning classifiers were trained to detect tobacco-related vs. irrelevant tweets as well as each of the above categories, using Naïve Bayes, k-Nearest Neighbors, and Support Vector Machine algorithms. Finally, phi correlation coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genre of tweets was personal experience, followed by categories such as opinion, marketing, and news. The most common themes were hookah, cessation, and social image, and sentiment toward tobacco was more positive (26%) than negative (20%). The most highly correlated categories were social image–underage, marketing–e-cigs, and personal experience–positive sentiment. E-cigarettes were also correlated with positive sentiment and new users (even excluding marketing posts), while hookah was highly correlated with positive sentiment, pleasure, and social relationships. Further, tweets matching the term “hookah” reflected the most positive sentiment, and “tobacco” the most negative (Figure 1). Finally, negative sentiment correlated most highly with social image, disgust, and non-experiential categories such as opinion and information. The best machine classification performance for tobacco vs. nontobacco tweets was achieved by an SVM classifier with 82% accuracy (baseline 57%). Individual categories showed similar improvements over baseline. CONCLUSIONS: Several novel findings speak to the unique insights of Twitter surveillance. Sentiment toward tobacco among Twitter users is more positive than negative, affirming Twitter’s value in understanding positive sentiment. Negative sentiment is equally useful: for example, observed high correlations between negative sentiment and social image, but not health, may usefully inform outreach strategies. Twitter surveillance further reveals opportunities for education: positive sentiment toward the term “hookah” but negative sentiment toward “tobacco” suggests a disconnect in users’ perceptions of hookah’s health effects. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, allowing for automated tobacco surveillance applications. [Figure: see text]
format Online
Article
Text
id pubmed-3692913
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36929132013-06-26 Content Analysis of Tobacco-related Twitter Posts Myslín, Mark Zhu, Shu-Hong Conway, Michael Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: We present results of a content analysis of tobacco-related Twitter posts (tweets), focusing on tweets referencing e-cigarettes and hookah. INTRODUCTION: Vast amounts of free, real-time, localizable Twitter data offer new possibilities for public health workers to identify trends and attitudes that more traditional surveillance methods may not capture, particularly in emerging areas of public health concern where reliable statistical evidence is not readily accessible. Existing applications include tracking public informedness during disease outbreaks [1]. Twitter-based surveillance is particularly suited to new challenges in tobacco control. Hookah and e-cigarettes have surged in popularity, yet regulation and public information remain sparse, despite controversial health effects [2,3]. Ubiquitous online marketing of these products and their popularity among new and younger users make Twitter a key resource for tobacco surveillance. METHODS: We collected 7,300 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012, using ten general keywords such as cig(*) and hookah. Each tweet was manually classified using a tri-axial scheme, capturing genre (firsthand experience, joke, news, …), theme (underage usage, health, social image, …), and sentiment (positive, negative, neutral). Machine-learning classifiers were trained to detect tobacco-related vs. irrelevant tweets as well as each of the above categories, using Naïve Bayes, k-Nearest Neighbors, and Support Vector Machine algorithms. Finally, phi correlation coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genre of tweets was personal experience, followed by categories such as opinion, marketing, and news. The most common themes were hookah, cessation, and social image, and sentiment toward tobacco was more positive (26%) than negative (20%). The most highly correlated categories were social image–underage, marketing–e-cigs, and personal experience–positive sentiment. E-cigarettes were also correlated with positive sentiment and new users (even excluding marketing posts), while hookah was highly correlated with positive sentiment, pleasure, and social relationships. Further, tweets matching the term “hookah” reflected the most positive sentiment, and “tobacco” the most negative (Figure 1). Finally, negative sentiment correlated most highly with social image, disgust, and non-experiential categories such as opinion and information. The best machine classification performance for tobacco vs. nontobacco tweets was achieved by an SVM classifier with 82% accuracy (baseline 57%). Individual categories showed similar improvements over baseline. CONCLUSIONS: Several novel findings speak to the unique insights of Twitter surveillance. Sentiment toward tobacco among Twitter users is more positive than negative, affirming Twitter’s value in understanding positive sentiment. Negative sentiment is equally useful: for example, observed high correlations between negative sentiment and social image, but not health, may usefully inform outreach strategies. Twitter surveillance further reveals opportunities for education: positive sentiment toward the term “hookah” but negative sentiment toward “tobacco” suggests a disconnect in users’ perceptions of hookah’s health effects. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, allowing for automated tobacco surveillance applications. [Figure: see text] University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692913/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Myslín, Mark
Zhu, Shu-Hong
Conway, Michael
Content Analysis of Tobacco-related Twitter Posts
title Content Analysis of Tobacco-related Twitter Posts
title_full Content Analysis of Tobacco-related Twitter Posts
title_fullStr Content Analysis of Tobacco-related Twitter Posts
title_full_unstemmed Content Analysis of Tobacco-related Twitter Posts
title_short Content Analysis of Tobacco-related Twitter Posts
title_sort content analysis of tobacco-related twitter posts
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692913/
work_keys_str_mv AT myslinmark contentanalysisoftobaccorelatedtwitterposts
AT zhushuhong contentanalysisoftobaccorelatedtwitterposts
AT conwaymichael contentanalysisoftobaccorelatedtwitterposts