Cargando…

Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

BACKGROUND: While scientific knowledge of post–COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported h...

Descripción completa

Detalles Bibliográficos
Autores principales: Dolatabadi, Elham, Moyano, Diana, Bales, Michael, Spasojevic, Sofija, Bhambhoria, Rohan, Bhatti, Junaid, Debnath, Shyamolima, Hoell, Nicholas, Li, Xin, Leng, Celine, Nanda, Sasha, Saab, Jad, Sahak, Esmat, Sie, Fanny, Uppal, Sara, Vadlamudi, Nirma Khatri, Vladimirova, Antoaneta, Yakimovich, Artur, Yang, Xiaoxue, Kocak, Sedef Akinli, Cheung, Angela M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10510753/
https://www.ncbi.nlm.nih.gov/pubmed/37725432
http://dx.doi.org/10.2196/45767
_version_ 1785108010594467840
author Dolatabadi, Elham
Moyano, Diana
Bales, Michael
Spasojevic, Sofija
Bhambhoria, Rohan
Bhatti, Junaid
Debnath, Shyamolima
Hoell, Nicholas
Li, Xin
Leng, Celine
Nanda, Sasha
Saab, Jad
Sahak, Esmat
Sie, Fanny
Uppal, Sara
Vadlamudi, Nirma Khatri
Vladimirova, Antoaneta
Yakimovich, Artur
Yang, Xiaoxue
Kocak, Sedef Akinli
Cheung, Angela M
author_facet Dolatabadi, Elham
Moyano, Diana
Bales, Michael
Spasojevic, Sofija
Bhambhoria, Rohan
Bhatti, Junaid
Debnath, Shyamolima
Hoell, Nicholas
Li, Xin
Leng, Celine
Nanda, Sasha
Saab, Jad
Sahak, Esmat
Sie, Fanny
Uppal, Sara
Vadlamudi, Nirma Khatri
Vladimirova, Antoaneta
Yakimovich, Artur
Yang, Xiaoxue
Kocak, Sedef Akinli
Cheung, Angela M
author_sort Dolatabadi, Elham
collection PubMed
description BACKGROUND: While scientific knowledge of post–COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians. OBJECTIVE: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline’s potential as a surveillance tool. METHODS: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries. RESULTS: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada. CONCLUSIONS: The outcome of our social media–derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient’s journey that can help health care providers anticipate future needs. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2022.12.14.22283419
format Online
Article
Text
id pubmed-10510753
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-105107532023-09-21 Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach Dolatabadi, Elham Moyano, Diana Bales, Michael Spasojevic, Sofija Bhambhoria, Rohan Bhatti, Junaid Debnath, Shyamolima Hoell, Nicholas Li, Xin Leng, Celine Nanda, Sasha Saab, Jad Sahak, Esmat Sie, Fanny Uppal, Sara Vadlamudi, Nirma Khatri Vladimirova, Antoaneta Yakimovich, Artur Yang, Xiaoxue Kocak, Sedef Akinli Cheung, Angela M J Med Internet Res Original Paper BACKGROUND: While scientific knowledge of post–COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians. OBJECTIVE: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline’s potential as a surveillance tool. METHODS: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries. RESULTS: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada. CONCLUSIONS: The outcome of our social media–derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient’s journey that can help health care providers anticipate future needs. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2022.12.14.22283419 JMIR Publications 2023-09-19 /pmc/articles/PMC10510753/ /pubmed/37725432 http://dx.doi.org/10.2196/45767 Text en ©Elham Dolatabadi, Diana Moyano, Michael Bales, Sofija Spasojevic, Rohan Bhambhoria, Junaid Bhatti, Shyamolima Debnath, Nicholas Hoell, Xin Li, Celine Leng, Sasha Nanda, Jad Saab, Esmat Sahak, Fanny Sie, Sara Uppal, Nirma Khatri Vadlamudi, Antoaneta Vladimirova, Artur Yakimovich, Xiaoxue Yang, Sedef Akinli Kocak, Angela M Cheung. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 19.09.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Dolatabadi, Elham
Moyano, Diana
Bales, Michael
Spasojevic, Sofija
Bhambhoria, Rohan
Bhatti, Junaid
Debnath, Shyamolima
Hoell, Nicholas
Li, Xin
Leng, Celine
Nanda, Sasha
Saab, Jad
Sahak, Esmat
Sie, Fanny
Uppal, Sara
Vadlamudi, Nirma Khatri
Vladimirova, Antoaneta
Yakimovich, Artur
Yang, Xiaoxue
Kocak, Sedef Akinli
Cheung, Angela M
Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title_full Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title_fullStr Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title_full_unstemmed Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title_short Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach
title_sort using social media to help understand patient-reported health outcomes of post–covid-19 condition: natural language processing approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10510753/
https://www.ncbi.nlm.nih.gov/pubmed/37725432
http://dx.doi.org/10.2196/45767
work_keys_str_mv AT dolatabadielham usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT moyanodiana usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT balesmichael usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT spasojevicsofija usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT bhambhoriarohan usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT bhattijunaid usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT debnathshyamolima usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT hoellnicholas usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT lixin usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT lengceline usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT nandasasha usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT saabjad usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT sahakesmat usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT siefanny usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT uppalsara usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT vadlamudinirmakhatri usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT vladimirovaantoaneta usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT yakimovichartur usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT yangxiaoxue usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT kocaksedefakinli usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach
AT cheungangelam usingsocialmediatohelpunderstandpatientreportedhealthoutcomesofpostcovid19conditionnaturallanguageprocessingapproach