Cargando…

Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification

BACKGROUND: Instagram, with millions of posts per day, can be used to inform public health surveillance targets and policies. However, current research relying on image-based data often relies on hand coding of images, which is time-consuming and costly, ultimately limiting the scope of the study. C...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Youshan, Allem, Jon-Patrick, Unger, Jennifer Beth, Boley Cruz, Tess
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6282010/
https://www.ncbi.nlm.nih.gov/pubmed/30452385
http://dx.doi.org/10.2196/10513
_version_ 1783378909348233216
author Zhang, Youshan
Allem, Jon-Patrick
Unger, Jennifer Beth
Boley Cruz, Tess
author_facet Zhang, Youshan
Allem, Jon-Patrick
Unger, Jennifer Beth
Boley Cruz, Tess
author_sort Zhang, Youshan
collection PubMed
description BACKGROUND: Instagram, with millions of posts per day, can be used to inform public health surveillance targets and policies. However, current research relying on image-based data often relies on hand coding of images, which is time-consuming and costly, ultimately limiting the scope of the study. Current best practices in automated image classification (eg, support vector machine (SVM), backpropagation neural network, and artificial neural network) are limited in their capacity to accurately distinguish between objects within images. OBJECTIVE: This study aimed to demonstrate how a convolutional neural network (CNN) can be used to extract unique features within an image and how SVM can then be used to classify the image. METHODS: Images of waterpipes or hookah (an emerging tobacco product possessing similar harms to that of cigarettes) were collected from Instagram and used in the analyses (N=840). A CNN was used to extract unique features from images identified to contain waterpipes. An SVM classifier was built to distinguish between images with and without waterpipes. Methods for image classification were then compared to show how a CNN+SVM classifier could improve accuracy. RESULTS: As the number of validated training images increased, the total number of extracted features increased. In addition, as the number of features learned by the SVM classifier increased, the average level of accuracy increased. Overall, 99.5% (418/420) of images classified were correctly identified as either hookah or nonhookah images. This level of accuracy was an improvement over earlier methods that used SVM, CNN, or bag-of-features alone. CONCLUSIONS: A CNN extracts more features of images, allowing an SVM classifier to be better informed, resulting in higher accuracy compared with methods that extract fewer features. Future research can use this method to grow the scope of image-based studies. The methods presented here might help detect increases in the popularity of certain tobacco products over time on social media. By taking images of waterpipes from Instagram, we place our methods in a context that can be utilized to inform health researchers analyzing social media to understand user experience with emerging tobacco products and inform public health surveillance targets and policies.
format Online
Article
Text
id pubmed-6282010
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-62820102019-01-03 Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification Zhang, Youshan Allem, Jon-Patrick Unger, Jennifer Beth Boley Cruz, Tess J Med Internet Res Original Paper BACKGROUND: Instagram, with millions of posts per day, can be used to inform public health surveillance targets and policies. However, current research relying on image-based data often relies on hand coding of images, which is time-consuming and costly, ultimately limiting the scope of the study. Current best practices in automated image classification (eg, support vector machine (SVM), backpropagation neural network, and artificial neural network) are limited in their capacity to accurately distinguish between objects within images. OBJECTIVE: This study aimed to demonstrate how a convolutional neural network (CNN) can be used to extract unique features within an image and how SVM can then be used to classify the image. METHODS: Images of waterpipes or hookah (an emerging tobacco product possessing similar harms to that of cigarettes) were collected from Instagram and used in the analyses (N=840). A CNN was used to extract unique features from images identified to contain waterpipes. An SVM classifier was built to distinguish between images with and without waterpipes. Methods for image classification were then compared to show how a CNN+SVM classifier could improve accuracy. RESULTS: As the number of validated training images increased, the total number of extracted features increased. In addition, as the number of features learned by the SVM classifier increased, the average level of accuracy increased. Overall, 99.5% (418/420) of images classified were correctly identified as either hookah or nonhookah images. This level of accuracy was an improvement over earlier methods that used SVM, CNN, or bag-of-features alone. CONCLUSIONS: A CNN extracts more features of images, allowing an SVM classifier to be better informed, resulting in higher accuracy compared with methods that extract fewer features. Future research can use this method to grow the scope of image-based studies. The methods presented here might help detect increases in the popularity of certain tobacco products over time on social media. By taking images of waterpipes from Instagram, we place our methods in a context that can be utilized to inform health researchers analyzing social media to understand user experience with emerging tobacco products and inform public health surveillance targets and policies. JMIR Publications 2018-11-21 /pmc/articles/PMC6282010/ /pubmed/30452385 http://dx.doi.org/10.2196/10513 Text en ©Youshan Zhang, Jon-Patrick Allem, Jennifer Beth Unger, Tess Boley Cruz. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 21.11.2018. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Zhang, Youshan
Allem, Jon-Patrick
Unger, Jennifer Beth
Boley Cruz, Tess
Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title_full Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title_fullStr Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title_full_unstemmed Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title_short Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification
title_sort automated identification of hookahs (waterpipes) on instagram: an application in feature extraction using convolutional neural network and support vector machine classification
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6282010/
https://www.ncbi.nlm.nih.gov/pubmed/30452385
http://dx.doi.org/10.2196/10513
work_keys_str_mv AT zhangyoushan automatedidentificationofhookahswaterpipesoninstagramanapplicationinfeatureextractionusingconvolutionalneuralnetworkandsupportvectormachineclassification
AT allemjonpatrick automatedidentificationofhookahswaterpipesoninstagramanapplicationinfeatureextractionusingconvolutionalneuralnetworkandsupportvectormachineclassification
AT ungerjenniferbeth automatedidentificationofhookahswaterpipesoninstagramanapplicationinfeatureextractionusingconvolutionalneuralnetworkandsupportvectormachineclassification
AT boleycruztess automatedidentificationofhookahswaterpipesoninstagramanapplicationinfeatureextractionusingconvolutionalneuralnetworkandsupportvectormachineclassification