Cargando…
Discovering Health Topics in Social Media Using Topic Models
By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goa...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118877/ https://www.ncbi.nlm.nih.gov/pubmed/25084530 http://dx.doi.org/10.1371/journal.pone.0103408 |
_version_ | 1782328898595323904 |
---|---|
author | Paul, Michael J. Dredze, Mark |
author_facet | Paul, Michael J. Dredze, Mark |
author_sort | Paul, Michael J. |
collection | PubMed |
description | By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media. |
format | Online Article Text |
id | pubmed-4118877 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-41188772014-08-04 Discovering Health Topics in Social Media Using Topic Models Paul, Michael J. Dredze, Mark PLoS One Research Article By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media. Public Library of Science 2014-08-01 /pmc/articles/PMC4118877/ /pubmed/25084530 http://dx.doi.org/10.1371/journal.pone.0103408 Text en © 2014 Paul, Dredze http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Paul, Michael J. Dredze, Mark Discovering Health Topics in Social Media Using Topic Models |
title | Discovering Health Topics in Social Media Using Topic Models |
title_full | Discovering Health Topics in Social Media Using Topic Models |
title_fullStr | Discovering Health Topics in Social Media Using Topic Models |
title_full_unstemmed | Discovering Health Topics in Social Media Using Topic Models |
title_short | Discovering Health Topics in Social Media Using Topic Models |
title_sort | discovering health topics in social media using topic models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118877/ https://www.ncbi.nlm.nih.gov/pubmed/25084530 http://dx.doi.org/10.1371/journal.pone.0103408 |
work_keys_str_mv | AT paulmichaelj discoveringhealthtopicsinsocialmediausingtopicmodels AT dredzemark discoveringhealthtopicsinsocialmediausingtopicmodels |