Cargando…

Content Analysis of Syndromic Twitter Data

OBJECTIVE: We present an annotation scheme developed to analyze syndromic Twitter data, and the results of its application to a set of respiratory syndrome-related tweets [1]. The scheme was designed to differentiate true positive tweets (where an individual is experiencing respiratory symptoms) fro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Keffala, Bethany, Conway, Mike, Doan, Son, Collier, Nigel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	University of Illinois at Chicago Library 2013
Materias:	ISDS 2012 Conference Abstracts
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692812/

_version_	1782274661195710464
author	Keffala, Bethany Conway, Mike Doan, Son Collier, Nigel
author_facet	Keffala, Bethany Conway, Mike Doan, Son Collier, Nigel
author_sort	Keffala, Bethany
collection	PubMed
description	OBJECTIVE: We present an annotation scheme developed to analyze syndromic Twitter data, and the results of its application to a set of respiratory syndrome-related tweets [1]. The scheme was designed to differentiate true positive tweets (where an individual is experiencing respiratory symptoms) from false positive tweets (where an individual is not experiencing respiratory symptoms), and to quantify more fine-grained information within the data. INTRODUCTION: The popularity of Twitter, a social-networking service, creates the opportunity for researchers to collect large amounts of free, localizable data in real-time. Data takes the form of short, user-written messages, and has been employed for general syndromic surveillance [2] and surveillance of public attitudes toward the H1N1 flu outbreak [3]. Accessibility of tweets in real-time makes them particularly appropriate for use in early warning systems. Data collected through keyword search contains a significant amount of noise, however, annotation can help boost the signal for true positive tweets. METHODS: The annotation scheme was developed based on information relevant for early warning systems (e.g. who is experiencing symptoms, and when) as well as other information present in the tweets (e.g. aspirations regarding symptoms, or abuse of substances such as cough syrup). Categories included Experiencer: Self/Other, Temporality: Current/Non-Current, Sentiment: Positive/Negative, Information: Providing/Seeking, Language: Non-English, Aspiration, Hyperbole, and Substance Abuse. All categories with the exception of Language and Substance Abuse were defined in reference to diseases or symptoms. The scheme was applied to 1,100 respiratory syndrome-related tweets (544 false positive, 556 true positive) from a previously collected corpus of syndromic twitter data [2]. Inter-annotator agreement was calculated for 9% of the data (100 tweets). RESULTS: Inter-annotator agreement was generally good, however certain categories had lower scores. Categories for Experiencer, Temporality, Sentiment: Negative, Information: Providing, and Language all had Kappa values above .9, Sentiment: Positive, Aspiration, and Substance abuse had Kappa values above .7, and Information: Seeking and Hyperbole had Kappas above .6. There was good separation between true positive tweets and false positive tweets, especially for the Experiencer: Self, Temporality: Current, Sentiment: Negative, Aspiration, Hyperbole, and Substance Abuse categories (see Table). True positive data were more likely to belong to any category except Information: Providing, and Substance Abuse, in which cases false positive tweets had greater likelihood of category inclusion. Within the true positive data, we found that users were more likely to reference symptoms that they themselves were currently experiencing than they were to reference another person’s symptoms or non-current symptoms. Sentiment was largely negative, and there was significant use of aspiration and hyperbole. CONCLUSIONS: Future work will apply the scheme to other syndromes, including constitutional, gastrointestinal, neurological, rash, and hemorrhagic.
format	Online Article Text
id	pubmed-3692812
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	University of Illinois at Chicago Library
record_format	MEDLINE/PubMed
spelling	pubmed-36928122013-06-26 Content Analysis of Syndromic Twitter Data Keffala, Bethany Conway, Mike Doan, Son Collier, Nigel Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: We present an annotation scheme developed to analyze syndromic Twitter data, and the results of its application to a set of respiratory syndrome-related tweets [1]. The scheme was designed to differentiate true positive tweets (where an individual is experiencing respiratory symptoms) from false positive tweets (where an individual is not experiencing respiratory symptoms), and to quantify more fine-grained information within the data. INTRODUCTION: The popularity of Twitter, a social-networking service, creates the opportunity for researchers to collect large amounts of free, localizable data in real-time. Data takes the form of short, user-written messages, and has been employed for general syndromic surveillance [2] and surveillance of public attitudes toward the H1N1 flu outbreak [3]. Accessibility of tweets in real-time makes them particularly appropriate for use in early warning systems. Data collected through keyword search contains a significant amount of noise, however, annotation can help boost the signal for true positive tweets. METHODS: The annotation scheme was developed based on information relevant for early warning systems (e.g. who is experiencing symptoms, and when) as well as other information present in the tweets (e.g. aspirations regarding symptoms, or abuse of substances such as cough syrup). Categories included Experiencer: Self/Other, Temporality: Current/Non-Current, Sentiment: Positive/Negative, Information: Providing/Seeking, Language: Non-English, Aspiration, Hyperbole, and Substance Abuse. All categories with the exception of Language and Substance Abuse were defined in reference to diseases or symptoms. The scheme was applied to 1,100 respiratory syndrome-related tweets (544 false positive, 556 true positive) from a previously collected corpus of syndromic twitter data [2]. Inter-annotator agreement was calculated for 9% of the data (100 tweets). RESULTS: Inter-annotator agreement was generally good, however certain categories had lower scores. Categories for Experiencer, Temporality, Sentiment: Negative, Information: Providing, and Language all had Kappa values above .9, Sentiment: Positive, Aspiration, and Substance abuse had Kappa values above .7, and Information: Seeking and Hyperbole had Kappas above .6. There was good separation between true positive tweets and false positive tweets, especially for the Experiencer: Self, Temporality: Current, Sentiment: Negative, Aspiration, Hyperbole, and Substance Abuse categories (see Table). True positive data were more likely to belong to any category except Information: Providing, and Substance Abuse, in which cases false positive tweets had greater likelihood of category inclusion. Within the true positive data, we found that users were more likely to reference symptoms that they themselves were currently experiencing than they were to reference another person’s symptoms or non-current symptoms. Sentiment was largely negative, and there was significant use of aspiration and hyperbole. CONCLUSIONS: Future work will apply the scheme to other syndromes, including constitutional, gastrointestinal, neurological, rash, and hemorrhagic. University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692812/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle	ISDS 2012 Conference Abstracts Keffala, Bethany Conway, Mike Doan, Son Collier, Nigel Content Analysis of Syndromic Twitter Data
title	Content Analysis of Syndromic Twitter Data
title_full	Content Analysis of Syndromic Twitter Data
title_fullStr	Content Analysis of Syndromic Twitter Data
title_full_unstemmed	Content Analysis of Syndromic Twitter Data
title_short	Content Analysis of Syndromic Twitter Data
title_sort	content analysis of syndromic twitter data
topic	ISDS 2012 Conference Abstracts
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692812/
work_keys_str_mv	AT keffalabethany contentanalysisofsyndromictwitterdata AT conwaymike contentanalysisofsyndromictwitterdata AT doanson contentanalysisofsyndromictwitterdata AT colliernigel contentanalysisofsyndromictwitterdata

Content Analysis of Syndromic Twitter Data

Ejemplares similares