Cargando…

Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products

BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by...

Descripción completa

Detalles Bibliográficos
Autores principales: Myslín, Mark, Zhu, Shu-Hong, Chapman, Wendy, Conway, Mike
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications Inc. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3758063/
https://www.ncbi.nlm.nih.gov/pubmed/23989137
http://dx.doi.org/10.2196/jmir.2534
_version_ 1782282313667706880
author Myslín, Mark
Zhu, Shu-Hong
Chapman, Wendy
Conway, Mike
author_facet Myslín, Mark
Zhu, Shu-Hong
Chapman, Wendy
Conway, Mike
author_sort Myslín, Mark
collection PubMed
description BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. OBJECTIVE: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. METHODS: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phi(hookah-positive)=0.39; phi(e-cigs-positive)=0.19); correlations between search keywords and sentiment (χ(2) (4)=414.50, P<.001, Cramer’s V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85). CONCLUSIONS: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.
format Online
Article
Text
id pubmed-3758063
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher JMIR Publications Inc.
record_format MEDLINE/PubMed
spelling pubmed-37580632013-08-30 Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products Myslín, Mark Zhu, Shu-Hong Chapman, Wendy Conway, Mike J Med Internet Res Original Paper BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. OBJECTIVE: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. METHODS: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phi(hookah-positive)=0.39; phi(e-cigs-positive)=0.19); correlations between search keywords and sentiment (χ(2) (4)=414.50, P<.001, Cramer’s V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85). CONCLUSIONS: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications. JMIR Publications Inc. 2013-08-29 /pmc/articles/PMC3758063/ /pubmed/23989137 http://dx.doi.org/10.2196/jmir.2534 Text en ©Mark Myslín, Shu-Hong Zhu, Wendy Chapman, Mike Conway. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 29.08.2013. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Myslín, Mark
Zhu, Shu-Hong
Chapman, Wendy
Conway, Mike
Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title_full Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title_fullStr Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title_full_unstemmed Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title_short Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products
title_sort using twitter to examine smoking behavior and perceptions of emerging tobacco products
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3758063/
https://www.ncbi.nlm.nih.gov/pubmed/23989137
http://dx.doi.org/10.2196/jmir.2534
work_keys_str_mv AT myslinmark usingtwittertoexaminesmokingbehaviorandperceptionsofemergingtobaccoproducts
AT zhushuhong usingtwittertoexaminesmokingbehaviorandperceptionsofemergingtobaccoproducts
AT chapmanwendy usingtwittertoexaminesmokingbehaviorandperceptionsofemergingtobaccoproducts
AT conwaymike usingtwittertoexaminesmokingbehaviorandperceptionsofemergingtobaccoproducts