Cargando…

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis

BACKGROUND: Tweets can provide broad, real-time perspectives about health and medical diagnoses that can inform disease surveillance in geographic regions. Less is known, however, about how much individuals post about common health conditions or what they post about. OBJECTIVE: We sought to collect...

Descripción completa

Detalles Bibliográficos
Autores principales: Tufts, Christopher, Polsky, Daniel, Volpp, Kevin G, Groeneveld, Peter W, Ungar, Lyle, Merchant, Raina M, Pelullo, Arthur P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302232/
https://www.ncbi.nlm.nih.gov/pubmed/30522989
http://dx.doi.org/10.2196/10834
_version_ 1783381942410936320
author Tufts, Christopher
Polsky, Daniel
Volpp, Kevin G
Groeneveld, Peter W
Ungar, Lyle
Merchant, Raina M
Pelullo, Arthur P
author_facet Tufts, Christopher
Polsky, Daniel
Volpp, Kevin G
Groeneveld, Peter W
Ungar, Lyle
Merchant, Raina M
Pelullo, Arthur P
author_sort Tufts, Christopher
collection PubMed
description BACKGROUND: Tweets can provide broad, real-time perspectives about health and medical diagnoses that can inform disease surveillance in geographic regions. Less is known, however, about how much individuals post about common health conditions or what they post about. OBJECTIVE: We sought to collect and analyze tweets from 1 state about high prevalence health conditions and characterize the tweet volume and content. METHODS: We collected 408,296,620 tweets originating in Pennsylvania from 2012-2015 and compared the prevalence of 14 common diseases to the frequency of disease mentions on Twitter. We identified and corrected bias induced due to variance in disease term specificity and used the machine learning approach of differential language analysis to determine the content (words and themes) most highly correlated with each disease. RESULTS: Common disease terms were included in 226,802 tweets (174,381 tweets after disease term correction). Posts about breast cancer (39,156/174,381 messages, 22.45%; 306,127/12,702,379 prevalence, 2.41%) and diabetes (40,217/174,381 messages, 23.06%; 2,189,890/12,702,379 prevalence, 17.24%) were overrepresented on Twitter relative to disease prevalence, whereas hypertension (17,245/174,381 messages, 9.89%; 4,614,776/12,702,379 prevalence, 36.33%), chronic obstructive pulmonary disease (1648/174,381 messages, 0.95%; 1,083,627/12,702,379 prevalence, 8.53%), and heart disease (13,669/174,381 messages, 7.84%; 2,461,721/12,702,379 prevalence, 19.38%) were underrepresented. The content of messages also varied by disease. Personal experience messages accounted for 12.88% (578/4487) of prostate cancer tweets and 24.17% (4046/16,742) of asthma tweets. Awareness-themed tweets were more often about breast cancer (9139/39,156 messages, 23.34%) than asthma (1040/16,742 messages, 6.21%). Tweets about risk factors were more often about heart disease (1375/13,669 messages, 10.06%) than lymphoma (105/4927 messages, 2.13%). CONCLUSIONS: Twitter provides a window into the Web-based visibility of diseases and how the volume of Web-based content about diseases varies by condition. Further, the potential value in tweets is in the rich content they provide about individuals’ perspectives about diseases (eg, personal experiences, awareness, and risk factors) that are not otherwise easily captured through traditional surveys or administrative data.
format Online
Article
Text
id pubmed-6302232
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-63022322019-01-16 Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis Tufts, Christopher Polsky, Daniel Volpp, Kevin G Groeneveld, Peter W Ungar, Lyle Merchant, Raina M Pelullo, Arthur P JMIR Public Health Surveill Original Paper BACKGROUND: Tweets can provide broad, real-time perspectives about health and medical diagnoses that can inform disease surveillance in geographic regions. Less is known, however, about how much individuals post about common health conditions or what they post about. OBJECTIVE: We sought to collect and analyze tweets from 1 state about high prevalence health conditions and characterize the tweet volume and content. METHODS: We collected 408,296,620 tweets originating in Pennsylvania from 2012-2015 and compared the prevalence of 14 common diseases to the frequency of disease mentions on Twitter. We identified and corrected bias induced due to variance in disease term specificity and used the machine learning approach of differential language analysis to determine the content (words and themes) most highly correlated with each disease. RESULTS: Common disease terms were included in 226,802 tweets (174,381 tweets after disease term correction). Posts about breast cancer (39,156/174,381 messages, 22.45%; 306,127/12,702,379 prevalence, 2.41%) and diabetes (40,217/174,381 messages, 23.06%; 2,189,890/12,702,379 prevalence, 17.24%) were overrepresented on Twitter relative to disease prevalence, whereas hypertension (17,245/174,381 messages, 9.89%; 4,614,776/12,702,379 prevalence, 36.33%), chronic obstructive pulmonary disease (1648/174,381 messages, 0.95%; 1,083,627/12,702,379 prevalence, 8.53%), and heart disease (13,669/174,381 messages, 7.84%; 2,461,721/12,702,379 prevalence, 19.38%) were underrepresented. The content of messages also varied by disease. Personal experience messages accounted for 12.88% (578/4487) of prostate cancer tweets and 24.17% (4046/16,742) of asthma tweets. Awareness-themed tweets were more often about breast cancer (9139/39,156 messages, 23.34%) than asthma (1040/16,742 messages, 6.21%). Tweets about risk factors were more often about heart disease (1375/13,669 messages, 10.06%) than lymphoma (105/4927 messages, 2.13%). CONCLUSIONS: Twitter provides a window into the Web-based visibility of diseases and how the volume of Web-based content about diseases varies by condition. Further, the potential value in tweets is in the rich content they provide about individuals’ perspectives about diseases (eg, personal experiences, awareness, and risk factors) that are not otherwise easily captured through traditional surveys or administrative data. JMIR Publications 2018-12-06 /pmc/articles/PMC6302232/ /pubmed/30522989 http://dx.doi.org/10.2196/10834 Text en ©Christopher Tufts, Daniel Polsky, Kevin G Volpp, Peter W Groeneveld, Lyle Ungar, Raina M Merchant, Arthur P Pelullo. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 06.12.2018. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Tufts, Christopher
Polsky, Daniel
Volpp, Kevin G
Groeneveld, Peter W
Ungar, Lyle
Merchant, Raina M
Pelullo, Arthur P
Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title_full Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title_fullStr Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title_full_unstemmed Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title_short Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis
title_sort characterizing tweet volume and content about common health conditions across pennsylvania: retrospective analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302232/
https://www.ncbi.nlm.nih.gov/pubmed/30522989
http://dx.doi.org/10.2196/10834
work_keys_str_mv AT tuftschristopher characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT polskydaniel characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT volppkeving characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT groeneveldpeterw characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT ungarlyle characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT merchantrainam characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis
AT pelulloarthurp characterizingtweetvolumeandcontentaboutcommonhealthconditionsacrosspennsylvaniaretrospectiveanalysis