Cargando…
Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance
We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—ast...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638773/ https://www.ncbi.nlm.nih.gov/pubmed/31318885 http://dx.doi.org/10.1371/journal.pone.0210689 |
_version_ | 1783436368966320128 |
---|---|
author | Edo-Osagie, Oduwa Smith, Gillian Lake, Iain Edeghere, Obaghe De La Iglesia, Beatriz |
author_facet | Edo-Osagie, Oduwa Smith, Gillian Lake, Iain Edeghere, Obaghe De La Iglesia, Beatriz |
author_sort | Edo-Osagie, Oduwa |
collection | PubMed |
description | We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions. |
format | Online Article Text |
id | pubmed-6638773 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66387732019-07-25 Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance Edo-Osagie, Oduwa Smith, Gillian Lake, Iain Edeghere, Obaghe De La Iglesia, Beatriz PLoS One Research Article We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions. Public Library of Science 2019-07-18 /pmc/articles/PMC6638773/ /pubmed/31318885 http://dx.doi.org/10.1371/journal.pone.0210689 Text en © 2019 Edo-Osagie et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Edo-Osagie, Oduwa Smith, Gillian Lake, Iain Edeghere, Obaghe De La Iglesia, Beatriz Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title | Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title_full | Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title_fullStr | Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title_full_unstemmed | Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title_short | Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
title_sort | twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638773/ https://www.ncbi.nlm.nih.gov/pubmed/31318885 http://dx.doi.org/10.1371/journal.pone.0210689 |
work_keys_str_mv | AT edoosagieoduwa twitterminingusingsemisupervisedclassificationforrelevancefilteringinsyndromicsurveillance AT smithgillian twitterminingusingsemisupervisedclassificationforrelevancefilteringinsyndromicsurveillance AT lakeiain twitterminingusingsemisupervisedclassificationforrelevancefilteringinsyndromicsurveillance AT edeghereobaghe twitterminingusingsemisupervisedclassificationforrelevancefilteringinsyndromicsurveillance AT delaiglesiabeatriz twitterminingusingsemisupervisedclassificationforrelevancefilteringinsyndromicsurveillance |