Cargando…

Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification

There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ren, Yang, Wu, Dezhi, Singh, Avineet, Kasson, Erin, Huang, Ming, Cavazos-Rehg, Patricia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8866955/ https://www.ncbi.nlm.nih.gov/pubmed/35224484 http://dx.doi.org/10.3389/fdata.2022.770585

_version_	1784655947609669632
author	Ren, Yang Wu, Dezhi Singh, Avineet Kasson, Erin Huang, Ming Cavazos-Rehg, Patricia
author_facet	Ren, Yang Wu, Dezhi Singh, Avineet Kasson, Erin Huang, Ming Cavazos-Rehg, Patricia
author_sort	Ren, Yang
collection	PubMed
description	There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary widely, and the health effects of vaping are still largely unknown. As a popular social media, Twitter contains rich information shared by users about their behaviors and experiences, including opinions on vaping. It is very challenging to identify vaping-related tweets to source useful information manually. In the current study, we proposed to develop a detection model to accurately identify vaping-related tweets using machine learning and deep learning methods. Specifically, we applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build our customized classification model. We extracted a set of sample tweets during an outbreak of e-cigarette or vaping-related lung injury (EVALI) in 2019 and created an annotated corpus to train and evaluate these models. After comparing the performance of each model, we found that the stacking ensemble learning achieved the highest performance with an F1-score of 0.97. All models could achieve 0.90 or higher after tuning hyperparameters. The ensemble learning model has the best average performance. Our study findings provide informative guidelines and practical implications for the automated detection of themed social media data for public opinions and health surveillance purposes.
format	Online Article Text
id	pubmed-8866955
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-88669552022-02-25 Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification Ren, Yang Wu, Dezhi Singh, Avineet Kasson, Erin Huang, Ming Cavazos-Rehg, Patricia Front Big Data Big Data There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary widely, and the health effects of vaping are still largely unknown. As a popular social media, Twitter contains rich information shared by users about their behaviors and experiences, including opinions on vaping. It is very challenging to identify vaping-related tweets to source useful information manually. In the current study, we proposed to develop a detection model to accurately identify vaping-related tweets using machine learning and deep learning methods. Specifically, we applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build our customized classification model. We extracted a set of sample tweets during an outbreak of e-cigarette or vaping-related lung injury (EVALI) in 2019 and created an annotated corpus to train and evaluate these models. After comparing the performance of each model, we found that the stacking ensemble learning achieved the highest performance with an F1-score of 0.97. All models could achieve 0.90 or higher after tuning hyperparameters. The ensemble learning model has the best average performance. Our study findings provide informative guidelines and practical implications for the automated detection of themed social media data for public opinions and health surveillance purposes. Frontiers Media S.A. 2022-02-10 /pmc/articles/PMC8866955/ /pubmed/35224484 http://dx.doi.org/10.3389/fdata.2022.770585 Text en Copyright © 2022 Ren, Wu, Singh, Kasson, Huang and Cavazos-Rehg. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Ren, Yang Wu, Dezhi Singh, Avineet Kasson, Erin Huang, Ming Cavazos-Rehg, Patricia Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title	Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title_full	Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title_fullStr	Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title_full_unstemmed	Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title_short	Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification
title_sort	automated detection of vaping-related tweets on twitter during the 2019 evali outbreak using machine learning classification
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8866955/ https://www.ncbi.nlm.nih.gov/pubmed/35224484 http://dx.doi.org/10.3389/fdata.2022.770585
work_keys_str_mv	AT renyang automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification AT wudezhi automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification AT singhavineet automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification AT kassonerin automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification AT huangming automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification AT cavazosrehgpatricia automateddetectionofvapingrelatedtweetsontwitterduringthe2019evalioutbreakusingmachinelearningclassification

Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification

Ejemplares similares