Cargando…

Detecting the Determinants of Health in Social Media

OBJECTIVE: Create an analysis pipeline that can detect the behavioral determinants of disease in the population using social media data. INTRODUCTION: The explosive use of social media sites presents a unique opportunity for developing alternative methods for understanding the health of the public....

Descripción completa

Detalles Bibliográficos
Autores principales: Rivers, Caitlin, Lewis, Bryan, Young, Sean
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692811/
_version_ 1782274660941955072
author Rivers, Caitlin
Lewis, Bryan
Young, Sean
author_facet Rivers, Caitlin
Lewis, Bryan
Young, Sean
author_sort Rivers, Caitlin
collection PubMed
description OBJECTIVE: Create an analysis pipeline that can detect the behavioral determinants of disease in the population using social media data. INTRODUCTION: The explosive use of social media sites presents a unique opportunity for developing alternative methods for understanding the health of the public. The near ubiquity of smartphones has further increased the volume and resolution of data that is shared through these sites. The emerging field of digital epidemiology[1] has focused on methods to analyze and use this “digital exhaust” to augment traditional epidemiologic methods. When applied to the task of disease detection they often detect outbreaks 1–2 weeks earlier than their traditional counterpart [1]. Many of these approaches successfully employ data mining techniques to detect symptoms associated with influenza-like illness [2]. Others can identify the appearance of novel symptom patterns, allowing the ability to detect the emergence of a new illness in a population [3]. However, behaviors that lead to increased risk for disease have not yet received this treatment. METHODS: We have created a methodology that can detect the behavioral determinants of disease in the population. Initially we have focused on risky behaviors that can contribute to HIV transmission in a population, however, the methodology is generalizable. We collected 15 million tweets based on 32 broad keywords relating to three types of risky behaviors associated with the transmission of HIV: drug use (e.g. meth), risky sexual behaviors (e.g. bareback), and other STIs (e.g. herpes). We then hand coded a subset of 2,537 unique tweets using a crowd-sourceable “game” that can be distributed online. This hand-coded set was used to train a simple n-gram classifier, which resulted in an algorithm to select relevant tweets from the larger database. We then generated geocodes from text locations provided by the tweet author, supplemented by the ∼1% of tweets that are already geolocated. We scaled these geocodes to the state and county levels, which allowed us to compare HIV prevalence in our collected data with public health data. RESULTS: We present the correlation between behaviors identified in social media and the corresponding impacts on disease incidence across a large population. Hand coding revealed that 34% of tweets with one or more of the 32 initial keywords was relevant to behaviors associated with HIV transmission. Among the three categories of initial search terms, the drug category yielded 21% true positives, compared to 9% for risky behaviors, and 2% for other STIs. The n-gram classifier measured 66% sensitivity and 44% specificity on a test set. In addition, our geolocation algorithm found coordinates for 88% of text locations. Of those, a test sample of 59 text locations showed that 83% of geolocations are correctly identified. These components combine to form an analysis pipeline for detecting risky behaviors across the United States. CONCLUSIONS: We present a surveillance methodology to help sift through the vast volumes of these data to detect behaviors and determinants of health contributing to both disease transmission and chronic illness. This effort allows for identification of at-risk communities and populations, which will facilitate targeted, primary and secondary-prevention efforts to improve public health.
format Online
Article
Text
id pubmed-3692811
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36928112013-06-26 Detecting the Determinants of Health in Social Media Rivers, Caitlin Lewis, Bryan Young, Sean Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: Create an analysis pipeline that can detect the behavioral determinants of disease in the population using social media data. INTRODUCTION: The explosive use of social media sites presents a unique opportunity for developing alternative methods for understanding the health of the public. The near ubiquity of smartphones has further increased the volume and resolution of data that is shared through these sites. The emerging field of digital epidemiology[1] has focused on methods to analyze and use this “digital exhaust” to augment traditional epidemiologic methods. When applied to the task of disease detection they often detect outbreaks 1–2 weeks earlier than their traditional counterpart [1]. Many of these approaches successfully employ data mining techniques to detect symptoms associated with influenza-like illness [2]. Others can identify the appearance of novel symptom patterns, allowing the ability to detect the emergence of a new illness in a population [3]. However, behaviors that lead to increased risk for disease have not yet received this treatment. METHODS: We have created a methodology that can detect the behavioral determinants of disease in the population. Initially we have focused on risky behaviors that can contribute to HIV transmission in a population, however, the methodology is generalizable. We collected 15 million tweets based on 32 broad keywords relating to three types of risky behaviors associated with the transmission of HIV: drug use (e.g. meth), risky sexual behaviors (e.g. bareback), and other STIs (e.g. herpes). We then hand coded a subset of 2,537 unique tweets using a crowd-sourceable “game” that can be distributed online. This hand-coded set was used to train a simple n-gram classifier, which resulted in an algorithm to select relevant tweets from the larger database. We then generated geocodes from text locations provided by the tweet author, supplemented by the ∼1% of tweets that are already geolocated. We scaled these geocodes to the state and county levels, which allowed us to compare HIV prevalence in our collected data with public health data. RESULTS: We present the correlation between behaviors identified in social media and the corresponding impacts on disease incidence across a large population. Hand coding revealed that 34% of tweets with one or more of the 32 initial keywords was relevant to behaviors associated with HIV transmission. Among the three categories of initial search terms, the drug category yielded 21% true positives, compared to 9% for risky behaviors, and 2% for other STIs. The n-gram classifier measured 66% sensitivity and 44% specificity on a test set. In addition, our geolocation algorithm found coordinates for 88% of text locations. Of those, a test sample of 59 text locations showed that 83% of geolocations are correctly identified. These components combine to form an analysis pipeline for detecting risky behaviors across the United States. CONCLUSIONS: We present a surveillance methodology to help sift through the vast volumes of these data to detect behaviors and determinants of health contributing to both disease transmission and chronic illness. This effort allows for identification of at-risk communities and populations, which will facilitate targeted, primary and secondary-prevention efforts to improve public health. University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692811/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Rivers, Caitlin
Lewis, Bryan
Young, Sean
Detecting the Determinants of Health in Social Media
title Detecting the Determinants of Health in Social Media
title_full Detecting the Determinants of Health in Social Media
title_fullStr Detecting the Determinants of Health in Social Media
title_full_unstemmed Detecting the Determinants of Health in Social Media
title_short Detecting the Determinants of Health in Social Media
title_sort detecting the determinants of health in social media
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692811/
work_keys_str_mv AT riverscaitlin detectingthedeterminantsofhealthinsocialmedia
AT lewisbryan detectingthedeterminantsofhealthinsocialmedia
AT youngsean detectingthedeterminantsofhealthinsocialmedia