Cargando…
A Scalable Framework to Detect Personal Health Mentions on Twitter
BACKGROUND: Biomedical research has traditionally been conducted via surveys and the analysis of medical records. However, these resources are limited in their content, such that non-traditional domains (eg, online forums and social media) have an opportunity to supplement the view of an individual’...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications Inc.
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526910/ https://www.ncbi.nlm.nih.gov/pubmed/26048075 http://dx.doi.org/10.2196/jmir.4305 |
_version_ | 1782384491892834304 |
---|---|
author | Yin, Zhijun Fabbri, Daniel Rosenbloom, S Trent Malin, Bradley |
author_facet | Yin, Zhijun Fabbri, Daniel Rosenbloom, S Trent Malin, Bradley |
author_sort | Yin, Zhijun |
collection | PubMed |
description | BACKGROUND: Biomedical research has traditionally been conducted via surveys and the analysis of medical records. However, these resources are limited in their content, such that non-traditional domains (eg, online forums and social media) have an opportunity to supplement the view of an individual’s health. OBJECTIVE: The objective of this study was to develop a scalable framework to detect personal health status mentions on Twitter and assess the extent to which such information is disclosed. METHODS: We collected more than 250 million tweets via the Twitter streaming API over a 2-month period in 2014. The corpus was filtered down to approximately 250,000 tweets, stratified across 34 high-impact health issues, based on guidance from the Medical Expenditure Panel Survey. We created a labeled corpus of several thousand tweets via a survey, administered over Amazon Mechanical Turk, that documents when terms correspond to mentions of personal health issues or an alternative (eg, a metaphor). We engineered a scalable classifier for personal health mentions via feature selection and assessed its potential over the health issues. We further investigated the utility of the tweets by determining the extent to which Twitter users disclose personal health status. RESULTS: Our investigation yielded several notable findings. First, we find that tweets from a small subset of the health issues can train a scalable classifier to detect health mentions. Specifically, training on 2000 tweets from four health issues (cancer, depression, hypertension, and leukemia) yielded a classifier with precision of 0.77 on all 34 health issues. Second, Twitter users disclosed personal health status for all health issues. Notably, personal health status was disclosed over 50% of the time for 11 out of 34 (33%) investigated health issues. Third, the disclosure rate was dependent on the health issue in a statistically significant manner (P<.001). For instance, more than 80% of the tweets about migraines (83/100) and allergies (85/100) communicated personal health status, while only around 10% of the tweets about obesity (13/100) and heart attack (12/100) did so. Fourth, the likelihood that people disclose their own versus other people’s health status was dependent on health issue in a statistically significant manner as well (P<.001). For example, 69% (69/100) of the insomnia tweets disclosed the author’s status, while only 1% (1/100) disclosed another person’s status. By contrast, 1% (1/100) of the Down syndrome tweets disclosed the author’s status, while 21% (21/100) disclosed another person’s status. CONCLUSIONS: It is possible to automatically detect personal health status mentions on Twitter in a scalable manner. These mentions correspond to the health issues of the Twitter users themselves, but also other individuals. Though this study did not investigate the veracity of such statements, we anticipate such information may be useful in supplementing traditional health-related sources for research purposes. |
format | Online Article Text |
id | pubmed-4526910 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | JMIR Publications Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-45269102015-08-11 A Scalable Framework to Detect Personal Health Mentions on Twitter Yin, Zhijun Fabbri, Daniel Rosenbloom, S Trent Malin, Bradley J Med Internet Res Original Paper BACKGROUND: Biomedical research has traditionally been conducted via surveys and the analysis of medical records. However, these resources are limited in their content, such that non-traditional domains (eg, online forums and social media) have an opportunity to supplement the view of an individual’s health. OBJECTIVE: The objective of this study was to develop a scalable framework to detect personal health status mentions on Twitter and assess the extent to which such information is disclosed. METHODS: We collected more than 250 million tweets via the Twitter streaming API over a 2-month period in 2014. The corpus was filtered down to approximately 250,000 tweets, stratified across 34 high-impact health issues, based on guidance from the Medical Expenditure Panel Survey. We created a labeled corpus of several thousand tweets via a survey, administered over Amazon Mechanical Turk, that documents when terms correspond to mentions of personal health issues or an alternative (eg, a metaphor). We engineered a scalable classifier for personal health mentions via feature selection and assessed its potential over the health issues. We further investigated the utility of the tweets by determining the extent to which Twitter users disclose personal health status. RESULTS: Our investigation yielded several notable findings. First, we find that tweets from a small subset of the health issues can train a scalable classifier to detect health mentions. Specifically, training on 2000 tweets from four health issues (cancer, depression, hypertension, and leukemia) yielded a classifier with precision of 0.77 on all 34 health issues. Second, Twitter users disclosed personal health status for all health issues. Notably, personal health status was disclosed over 50% of the time for 11 out of 34 (33%) investigated health issues. Third, the disclosure rate was dependent on the health issue in a statistically significant manner (P<.001). For instance, more than 80% of the tweets about migraines (83/100) and allergies (85/100) communicated personal health status, while only around 10% of the tweets about obesity (13/100) and heart attack (12/100) did so. Fourth, the likelihood that people disclose their own versus other people’s health status was dependent on health issue in a statistically significant manner as well (P<.001). For example, 69% (69/100) of the insomnia tweets disclosed the author’s status, while only 1% (1/100) disclosed another person’s status. By contrast, 1% (1/100) of the Down syndrome tweets disclosed the author’s status, while 21% (21/100) disclosed another person’s status. CONCLUSIONS: It is possible to automatically detect personal health status mentions on Twitter in a scalable manner. These mentions correspond to the health issues of the Twitter users themselves, but also other individuals. Though this study did not investigate the veracity of such statements, we anticipate such information may be useful in supplementing traditional health-related sources for research purposes. JMIR Publications Inc. 2015-06-05 /pmc/articles/PMC4526910/ /pubmed/26048075 http://dx.doi.org/10.2196/jmir.4305 Text en ©Zhijun Yin, Daniel Fabbri, S Trent Rosenbloom, Bradley Malin. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 05.06.2015. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Yin, Zhijun Fabbri, Daniel Rosenbloom, S Trent Malin, Bradley A Scalable Framework to Detect Personal Health Mentions on Twitter |
title | A Scalable Framework to Detect Personal Health Mentions on Twitter |
title_full | A Scalable Framework to Detect Personal Health Mentions on Twitter |
title_fullStr | A Scalable Framework to Detect Personal Health Mentions on Twitter |
title_full_unstemmed | A Scalable Framework to Detect Personal Health Mentions on Twitter |
title_short | A Scalable Framework to Detect Personal Health Mentions on Twitter |
title_sort | scalable framework to detect personal health mentions on twitter |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526910/ https://www.ncbi.nlm.nih.gov/pubmed/26048075 http://dx.doi.org/10.2196/jmir.4305 |
work_keys_str_mv | AT yinzhijun ascalableframeworktodetectpersonalhealthmentionsontwitter AT fabbridaniel ascalableframeworktodetectpersonalhealthmentionsontwitter AT rosenbloomstrent ascalableframeworktodetectpersonalhealthmentionsontwitter AT malinbradley ascalableframeworktodetectpersonalhealthmentionsontwitter AT yinzhijun scalableframeworktodetectpersonalhealthmentionsontwitter AT fabbridaniel scalableframeworktodetectpersonalhealthmentionsontwitter AT rosenbloomstrent scalableframeworktodetectpersonalhealthmentionsontwitter AT malinbradley scalableframeworktodetectpersonalhealthmentionsontwitter |