Cargando…

Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid

BACKGROUND: The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Yuan-Chi, Al-Garadi, Mohammed Ali, Bremer, Whitney, Zhu, Jane M, Grande, David, Sarker, Abeed
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8129876/ https://www.ncbi.nlm.nih.gov/pubmed/33938807 http://dx.doi.org/10.2196/26616

_version_	1783694394186006528
author	Yang, Yuan-Chi Al-Garadi, Mohammed Ali Bremer, Whitney Zhu, Jane M Grande, David Sarker, Abeed
author_facet	Yang, Yuan-Chi Al-Garadi, Mohammed Ali Bremer, Whitney Zhu, Jane M Grande, David Sarker, Abeed
author_sort	Yang, Yuan-Chi
collection	PubMed
description	BACKGROUND: The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity of content in social media chatter. OBJECTIVE: This study aims to develop and evaluate an automatic system involving natural language processing and machine learning to automatically characterize user-posted Twitter data about health services using Medicaid, the single largest source of health coverage in the United States, as an example. METHODS: We collected data from Twitter in two ways: via the public streaming application programming interface using Medicaid-related keywords (Corpus 1) and by using the website’s search option for tweets mentioning agency-specific handles (Corpus 2). We manually labeled a sample of tweets in 5 predetermined categories or other and artificially increased the number of training posts from specific low-frequency categories. Using the manually labeled data, we trained and evaluated several supervised learning algorithms, including support vector machine, random forest (RF), naïve Bayes, shallow neural network (NN), k-nearest neighbor, bidirectional long short-term memory, and bidirectional encoder representations from transformers (BERT). We then applied the best-performing classifier to the collected tweets for postclassification analyses to assess the utility of our methods. RESULTS: We manually annotated 11,379 tweets (Corpus 1: 9179; Corpus 2: 2200) and used 7930 (69.7%) for training, 1449 (12.7%) for validation, and 2000 (17.6%) for testing. A classifier based on BERT obtained the highest accuracies (81.7%, Corpus 1; 80.7%, Corpus 2) and F(1) scores on consumer feedback (0.58, Corpus 1; 0.90, Corpus 2), outperforming the second best classifiers in terms of accuracy (74.6%, RF on Corpus 1; 69.4%, RF on Corpus 2) and F(1) score on consumer feedback (0.44, NN on Corpus 1; 0.82, RF on Corpus 2). Postclassification analyses revealed differing intercorpora distributions of tweet categories, with political (400778/628411, 63.78%) and consumer feedback (15073/27337, 55.14%) tweets being the most frequent for Corpus 1 and Corpus 2, respectively. CONCLUSIONS: The broad and variable content of Medicaid-related tweets necessitates automatic categorization to identify topic-relevant posts. Our proposed system presents a feasible solution for automatic categorization and can be deployed and generalized for health service programs other than Medicaid. Annotated data and methods are available for future studies.
format	Online Article Text
id	pubmed-8129876
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-81298762021-05-24 Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid Yang, Yuan-Chi Al-Garadi, Mohammed Ali Bremer, Whitney Zhu, Jane M Grande, David Sarker, Abeed J Med Internet Res Original Paper BACKGROUND: The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity of content in social media chatter. OBJECTIVE: This study aims to develop and evaluate an automatic system involving natural language processing and machine learning to automatically characterize user-posted Twitter data about health services using Medicaid, the single largest source of health coverage in the United States, as an example. METHODS: We collected data from Twitter in two ways: via the public streaming application programming interface using Medicaid-related keywords (Corpus 1) and by using the website’s search option for tweets mentioning agency-specific handles (Corpus 2). We manually labeled a sample of tweets in 5 predetermined categories or other and artificially increased the number of training posts from specific low-frequency categories. Using the manually labeled data, we trained and evaluated several supervised learning algorithms, including support vector machine, random forest (RF), naïve Bayes, shallow neural network (NN), k-nearest neighbor, bidirectional long short-term memory, and bidirectional encoder representations from transformers (BERT). We then applied the best-performing classifier to the collected tweets for postclassification analyses to assess the utility of our methods. RESULTS: We manually annotated 11,379 tweets (Corpus 1: 9179; Corpus 2: 2200) and used 7930 (69.7%) for training, 1449 (12.7%) for validation, and 2000 (17.6%) for testing. A classifier based on BERT obtained the highest accuracies (81.7%, Corpus 1; 80.7%, Corpus 2) and F(1) scores on consumer feedback (0.58, Corpus 1; 0.90, Corpus 2), outperforming the second best classifiers in terms of accuracy (74.6%, RF on Corpus 1; 69.4%, RF on Corpus 2) and F(1) score on consumer feedback (0.44, NN on Corpus 1; 0.82, RF on Corpus 2). Postclassification analyses revealed differing intercorpora distributions of tweet categories, with political (400778/628411, 63.78%) and consumer feedback (15073/27337, 55.14%) tweets being the most frequent for Corpus 1 and Corpus 2, respectively. CONCLUSIONS: The broad and variable content of Medicaid-related tweets necessitates automatic categorization to identify topic-relevant posts. Our proposed system presents a feasible solution for automatic categorization and can be deployed and generalized for health service programs other than Medicaid. Annotated data and methods are available for future studies. JMIR Publications 2021-05-03 /pmc/articles/PMC8129876/ /pubmed/33938807 http://dx.doi.org/10.2196/26616 Text en ©Yuan-Chi Yang, Mohammed Ali Al-Garadi, Whitney Bremer, Jane M Zhu, David Grande, Abeed Sarker. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 03.05.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Yang, Yuan-Chi Al-Garadi, Mohammed Ali Bremer, Whitney Zhu, Jane M Grande, David Sarker, Abeed Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title	Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title_full	Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title_fullStr	Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title_full_unstemmed	Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title_short	Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid
title_sort	developing an automatic system for classifying chatter about health services on twitter: case study for medicaid
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8129876/ https://www.ncbi.nlm.nih.gov/pubmed/33938807 http://dx.doi.org/10.2196/26616
work_keys_str_mv	AT yangyuanchi developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid AT algaradimohammedali developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid AT bremerwhitney developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid AT zhujanem developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid AT grandedavid developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid AT sarkerabeed developinganautomaticsystemforclassifyingchatterabouthealthservicesontwittercasestudyformedicaid

Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid

Ejemplares similares