Cargando…

Identifying tweets of personal health experience through word embedding and LSTM neural network

BACKGROUND: As Twitter has become an active data source for health surveillance research, it is important that efficient and effective methods are developed to identify tweets related to personal health experience. Conventional classification algorithms rely on features engineered by human domain ex...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Keyuan, Feng, Shichao, Song, Qunhao, Calix, Ricardo A., Gupta, Matrika, Bernard, Gordon R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998756/ https://www.ncbi.nlm.nih.gov/pubmed/29897323 http://dx.doi.org/10.1186/s12859-018-2198-y

_version_	1783331291771437056
author	Jiang, Keyuan Feng, Shichao Song, Qunhao Calix, Ricardo A. Gupta, Matrika Bernard, Gordon R.
author_facet	Jiang, Keyuan Feng, Shichao Song, Qunhao Calix, Ricardo A. Gupta, Matrika Bernard, Gordon R.
author_sort	Jiang, Keyuan
collection	PubMed
description	BACKGROUND: As Twitter has become an active data source for health surveillance research, it is important that efficient and effective methods are developed to identify tweets related to personal health experience. Conventional classification algorithms rely on features engineered by human domain experts, and engineering such features is a challenging task and requires much human intelligence. The resultant features may not be optimal for the classification problem, and can make it challenging for conventional classifiers to correctly predict personal experience tweets (PETs) due to the various ways to express and/or describe personal experience in tweets. In this study, we developed a method that combines word embedding and long short-term memory (LSTM) model without the need to engineer any specific features. Through word embedding, tweet texts were represented as dense vectors which in turn were fed to the LSTM neural network as sequences. RESULTS: Statistical analyses of the results of 10-fold cross-validations of our method and conventional methods indicate that there exist significant differences (p < 0.01) in performance measures of accuracy, precision, recall, F1-score, and ROC/AUC, demonstrating that our approach outperforms the conventional methods in identifying PETs. CONCLUSION: We presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network. It is conceivable that our method can help accelerate and scale up analyzing textual data of social media for health surveillance purposes, because of no need for the laborious and costly process of engineering features.
format	Online Article Text
id	pubmed-5998756
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-59987562018-06-25 Identifying tweets of personal health experience through word embedding and LSTM neural network Jiang, Keyuan Feng, Shichao Song, Qunhao Calix, Ricardo A. Gupta, Matrika Bernard, Gordon R. BMC Bioinformatics Research BACKGROUND: As Twitter has become an active data source for health surveillance research, it is important that efficient and effective methods are developed to identify tweets related to personal health experience. Conventional classification algorithms rely on features engineered by human domain experts, and engineering such features is a challenging task and requires much human intelligence. The resultant features may not be optimal for the classification problem, and can make it challenging for conventional classifiers to correctly predict personal experience tweets (PETs) due to the various ways to express and/or describe personal experience in tweets. In this study, we developed a method that combines word embedding and long short-term memory (LSTM) model without the need to engineer any specific features. Through word embedding, tweet texts were represented as dense vectors which in turn were fed to the LSTM neural network as sequences. RESULTS: Statistical analyses of the results of 10-fold cross-validations of our method and conventional methods indicate that there exist significant differences (p < 0.01) in performance measures of accuracy, precision, recall, F1-score, and ROC/AUC, demonstrating that our approach outperforms the conventional methods in identifying PETs. CONCLUSION: We presented an efficient and effective method of identifying health-related personal experience tweets by combining word embedding and an LSTM neural network. It is conceivable that our method can help accelerate and scale up analyzing textual data of social media for health surveillance purposes, because of no need for the laborious and costly process of engineering features. BioMed Central 2018-06-13 /pmc/articles/PMC5998756/ /pubmed/29897323 http://dx.doi.org/10.1186/s12859-018-2198-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Jiang, Keyuan Feng, Shichao Song, Qunhao Calix, Ricardo A. Gupta, Matrika Bernard, Gordon R. Identifying tweets of personal health experience through word embedding and LSTM neural network
title	Identifying tweets of personal health experience through word embedding and LSTM neural network
title_full	Identifying tweets of personal health experience through word embedding and LSTM neural network
title_fullStr	Identifying tweets of personal health experience through word embedding and LSTM neural network
title_full_unstemmed	Identifying tweets of personal health experience through word embedding and LSTM neural network
title_short	Identifying tweets of personal health experience through word embedding and LSTM neural network
title_sort	identifying tweets of personal health experience through word embedding and lstm neural network
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998756/ https://www.ncbi.nlm.nih.gov/pubmed/29897323 http://dx.doi.org/10.1186/s12859-018-2198-y
work_keys_str_mv	AT jiangkeyuan identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork AT fengshichao identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork AT songqunhao identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork AT calixricardoa identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork AT guptamatrika identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork AT bernardgordonr identifyingtweetsofpersonalhealthexperiencethroughwordembeddingandlstmneuralnetwork

Identifying tweets of personal health experience through word embedding and LSTM neural network

Ejemplares similares