Cargando…

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis

BACKGROUND: Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being...

Descripción completa

Detalles Bibliográficos
Autores principales: Chew, Rob, Wenger, Michael, Guillory, Jamie, Nonnemaker, James, Kim, Annice
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808345/
https://www.ncbi.nlm.nih.gov/pubmed/35040793
http://dx.doi.org/10.2196/30257
_version_ 1784643868122152960
author Chew, Rob
Wenger, Michael
Guillory, Jamie
Nonnemaker, James
Kim, Annice
author_facet Chew, Rob
Wenger, Michael
Guillory, Jamie
Nonnemaker, James
Kim, Annice
author_sort Chew, Rob
collection PubMed
description BACKGROUND: Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups. OBJECTIVE: The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words. METHODS: NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models—conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network—to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds. RESULTS: For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall. CONCLUSIONS: Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.
format Online
Article
Text
id pubmed-8808345
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-88083452022-02-04 Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis Chew, Rob Wenger, Michael Guillory, Jamie Nonnemaker, James Kim, Annice J Med Internet Res Original Paper BACKGROUND: Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups. OBJECTIVE: The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words. METHODS: NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models—conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network—to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds. RESULTS: For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall. CONCLUSIONS: Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution. JMIR Publications 2022-01-18 /pmc/articles/PMC8808345/ /pubmed/35040793 http://dx.doi.org/10.2196/30257 Text en ©Rob Chew, Michael Wenger, Jamie Guillory, James Nonnemaker, Annice Kim. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Chew, Rob
Wenger, Michael
Guillory, Jamie
Nonnemaker, James
Kim, Annice
Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title_full Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title_fullStr Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title_full_unstemmed Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title_short Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis
title_sort identifying electronic nicotine delivery system brands and flavors on instagram: natural language processing analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808345/
https://www.ncbi.nlm.nih.gov/pubmed/35040793
http://dx.doi.org/10.2196/30257
work_keys_str_mv AT chewrob identifyingelectronicnicotinedeliverysystembrandsandflavorsoninstagramnaturallanguageprocessinganalysis
AT wengermichael identifyingelectronicnicotinedeliverysystembrandsandflavorsoninstagramnaturallanguageprocessinganalysis
AT guilloryjamie identifyingelectronicnicotinedeliverysystembrandsandflavorsoninstagramnaturallanguageprocessinganalysis
AT nonnemakerjames identifyingelectronicnicotinedeliverysystembrandsandflavorsoninstagramnaturallanguageprocessinganalysis
AT kimannice identifyingelectronicnicotinedeliverysystembrandsandflavorsoninstagramnaturallanguageprocessinganalysis