Cargando…

Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?

BACKGROUND: User content posted through Twitter has been used for biosurveillance, to characterize public perception of health-related topics, and as a means of distributing information to the general public. Most of the existing work surrounding Twitter and health care has shown Twitter to be an ef...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kendra, Rachel Lynn, Karki, Suman, Eickholt, Jesse Lee, Gandy, Lisa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications Inc. 2015
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526952/ https://www.ncbi.nlm.nih.gov/pubmed/26091775 http://dx.doi.org/10.2196/jmir.4220

_version_	1782384501378252800
author	Kendra, Rachel Lynn Karki, Suman Eickholt, Jesse Lee Gandy, Lisa
author_facet	Kendra, Rachel Lynn Karki, Suman Eickholt, Jesse Lee Gandy, Lisa
author_sort	Kendra, Rachel Lynn
collection	PubMed
description	BACKGROUND: User content posted through Twitter has been used for biosurveillance, to characterize public perception of health-related topics, and as a means of distributing information to the general public. Most of the existing work surrounding Twitter and health care has shown Twitter to be an effective medium for these problems but more could be done to provide finer and more efficient access to all pertinent data. Given the diversity of user-generated content, small samples or summary presentations of the data arguably omit a large part of the virtual discussion taking place in the Twittersphere. Still, managing, processing, and querying large amounts of Twitter data is not a trivial task. This work describes tools and techniques capable of handling larger sets of Twitter data and demonstrates their use with the issue of antibiotics. OBJECTIVE: This work has two principle objectives: (1) to provide an open-source means to efficiently explore all collected tweets and query health-related topics on Twitter, specifically, questions such as what users are saying and how messages are spread, and (2) to characterize the larger discourse taking place on Twitter with respect to antibiotics. METHODS: Open-source software suites Hadoop, Flume, and Hive were used to collect and query a large number of Twitter posts. To classify tweets by topic, a deep network classifier was trained using a limited number of manually classified tweets. The particular machine learning approach used also allowed the use of a large number of unclassified tweets to increase performance. RESULTS: Query-based analysis of the collected tweets revealed that a large number of users contributed to the online discussion and that a frequent topic mentioned was resistance. A number of prominent events related to antibiotics led to a number of spikes in activity but these were short in duration. The category-based classifier developed was able to correctly classify 70% of manually labeled tweets (using a 10-fold cross validation procedure and 9 classes). The classifier also performed well when evaluated on a per category basis. CONCLUSIONS: Using existing tools such as Hive, Flume, Hadoop, and machine learning techniques, it is possible to construct tools and workflows to collect and query large amounts of Twitter data to characterize the larger discussion taking place on Twitter with respect to a particular health-related topic. Furthermore, using newer machine learning techniques and a limited number of manually labeled tweets, an entire body of collected tweets can be classified to indicate what topics are driving the virtual, online discussion. The resulting classifier can also be used to efficiently explore collected tweets by category and search for messages of interest or exemplary content.
format	Online Article Text
id	pubmed-4526952
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	JMIR Publications Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-45269522015-08-11 Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture? Kendra, Rachel Lynn Karki, Suman Eickholt, Jesse Lee Gandy, Lisa J Med Internet Res Original Paper BACKGROUND: User content posted through Twitter has been used for biosurveillance, to characterize public perception of health-related topics, and as a means of distributing information to the general public. Most of the existing work surrounding Twitter and health care has shown Twitter to be an effective medium for these problems but more could be done to provide finer and more efficient access to all pertinent data. Given the diversity of user-generated content, small samples or summary presentations of the data arguably omit a large part of the virtual discussion taking place in the Twittersphere. Still, managing, processing, and querying large amounts of Twitter data is not a trivial task. This work describes tools and techniques capable of handling larger sets of Twitter data and demonstrates their use with the issue of antibiotics. OBJECTIVE: This work has two principle objectives: (1) to provide an open-source means to efficiently explore all collected tweets and query health-related topics on Twitter, specifically, questions such as what users are saying and how messages are spread, and (2) to characterize the larger discourse taking place on Twitter with respect to antibiotics. METHODS: Open-source software suites Hadoop, Flume, and Hive were used to collect and query a large number of Twitter posts. To classify tweets by topic, a deep network classifier was trained using a limited number of manually classified tweets. The particular machine learning approach used also allowed the use of a large number of unclassified tweets to increase performance. RESULTS: Query-based analysis of the collected tweets revealed that a large number of users contributed to the online discussion and that a frequent topic mentioned was resistance. A number of prominent events related to antibiotics led to a number of spikes in activity but these were short in duration. The category-based classifier developed was able to correctly classify 70% of manually labeled tweets (using a 10-fold cross validation procedure and 9 classes). The classifier also performed well when evaluated on a per category basis. CONCLUSIONS: Using existing tools such as Hive, Flume, Hadoop, and machine learning techniques, it is possible to construct tools and workflows to collect and query large amounts of Twitter data to characterize the larger discussion taking place on Twitter with respect to a particular health-related topic. Furthermore, using newer machine learning techniques and a limited number of manually labeled tweets, an entire body of collected tweets can be classified to indicate what topics are driving the virtual, online discussion. The resulting classifier can also be used to efficiently explore collected tweets by category and search for messages of interest or exemplary content. JMIR Publications Inc. 2015-06-19 /pmc/articles/PMC4526952/ /pubmed/26091775 http://dx.doi.org/10.2196/jmir.4220 Text en ©Rachel Lynn Kendra, Suman Karki, Jesse Lee Eickholt, Lisa Gandy. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 19.06.2015. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Kendra, Rachel Lynn Karki, Suman Eickholt, Jesse Lee Gandy, Lisa Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title	Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title_full	Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title_fullStr	Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title_full_unstemmed	Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title_short	Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?
title_sort	characterizing the discussion of antibiotics in the twittersphere: what is the bigger picture?
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4526952/ https://www.ncbi.nlm.nih.gov/pubmed/26091775 http://dx.doi.org/10.2196/jmir.4220
work_keys_str_mv	AT kendrarachellynn characterizingthediscussionofantibioticsinthetwitterspherewhatisthebiggerpicture AT karkisuman characterizingthediscussionofantibioticsinthetwitterspherewhatisthebiggerpicture AT eickholtjesselee characterizingthediscussionofantibioticsinthetwitterspherewhatisthebiggerpicture AT gandylisa characterizingthediscussionofantibioticsinthetwitterspherewhatisthebiggerpicture

Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture?

Ejemplares similares