Cargando…

Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study

BACKGROUND: The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wakamiya, Shoko, Kawai, Yukiko, Aramaki, Eiji
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2018
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6231889/ https://www.ncbi.nlm.nih.gov/pubmed/30274968 http://dx.doi.org/10.2196/publichealth.8627

_version_	1783370321988943872
author	Wakamiya, Shoko Kawai, Yukiko Aramaki, Eiji
author_facet	Wakamiya, Shoko Kawai, Yukiko Aramaki, Eiji
author_sort	Wakamiya, Shoko
collection	PubMed
description	BACKGROUND: The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share the following logic: they incorporate SNS users as social sensors. These social sensor–based approaches also share a common problem: SNS-based surveillance are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. OBJECTIVE: This study proposes a novel approach to estimate the trend of patient numbers using indirect information covering both urban areas and rural areas within the posts. METHODS: We presented a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning 3 years (7 million influenza-related tweets in Japanese) was used to evaluate the model. Both direct information and indirect information that mention other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not used directly and were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. RESULTS: The estimation performance of our approach was evaluated using the correlation coefficient between the number of influenza cases as the gold standard values and the estimated values by the proposed models. The results revealed that the baseline model (BASELINE+NLP) shows .36 and that the proposed model (TRAP+NLP) improved the accuracy (.70, +.34 points). CONCLUSIONS: The proposed approach by which the indirect information inhibits direct information exhibited improved estimation performance not only in rural cities but also in urban cities, which demonstrated the effectiveness of the proposed method consisting of a TRAP model and natural language processing (NLP) classification.
format	Online Article Text
id	pubmed-6231889
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-62318892018-12-10 Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study Wakamiya, Shoko Kawai, Yukiko Aramaki, Eiji JMIR Public Health Surveill Original Paper BACKGROUND: The recent rise in popularity and scale of social networking services (SNSs) has resulted in an increasing need for SNS-based information extraction systems. A popular application of SNS data is health surveillance for predicting an outbreak of epidemics by detecting diseases from text messages posted on SNS platforms. Such applications share the following logic: they incorporate SNS users as social sensors. These social sensor–based approaches also share a common problem: SNS-based surveillance are much more reliable if sufficient numbers of users are active, and small or inactive populations produce inconsistent results. OBJECTIVE: This study proposes a novel approach to estimate the trend of patient numbers using indirect information covering both urban areas and rural areas within the posts. METHODS: We presented a TRAP model by embedding both direct information and indirect information. A collection of tweets spanning 3 years (7 million influenza-related tweets in Japanese) was used to evaluate the model. Both direct information and indirect information that mention other places were used. As indirect information is less reliable (too noisy or too old) than direct information, the indirect information data were not used directly and were considered as inhibiting direct information. For example, when indirect information appeared often, it was considered as signifying that everyone already had a known disease, leading to a small amount of direct information. RESULTS: The estimation performance of our approach was evaluated using the correlation coefficient between the number of influenza cases as the gold standard values and the estimated values by the proposed models. The results revealed that the baseline model (BASELINE+NLP) shows .36 and that the proposed model (TRAP+NLP) improved the accuracy (.70, +.34 points). CONCLUSIONS: The proposed approach by which the indirect information inhibits direct information exhibited improved estimation performance not only in rural cities but also in urban cities, which demonstrated the effectiveness of the proposed method consisting of a TRAP model and natural language processing (NLP) classification. JMIR Publications 2018-09-25 /pmc/articles/PMC6231889/ /pubmed/30274968 http://dx.doi.org/10.2196/publichealth.8627 Text en ©Shoko Wakamiya, Yukiko Kawai, Eiji Aramaki. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 25.09.2018. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle	Original Paper Wakamiya, Shoko Kawai, Yukiko Aramaki, Eiji Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title	Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title_full	Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title_fullStr	Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title_full_unstemmed	Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title_short	Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study
title_sort	twitter-based influenza detection after flu peak via tweets with indirect information: text mining study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6231889/ https://www.ncbi.nlm.nih.gov/pubmed/30274968 http://dx.doi.org/10.2196/publichealth.8627
work_keys_str_mv	AT wakamiyashoko twitterbasedinfluenzadetectionafterflupeakviatweetswithindirectinformationtextminingstudy AT kawaiyukiko twitterbasedinfluenzadetectionafterflupeakviatweetswithindirectinformationtextminingstudy AT aramakieiji twitterbasedinfluenzadetectionafterflupeakviatweetswithindirectinformationtextminingstudy

Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study

Ejemplares similares