Cargando…

PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes

OBJECTIVE: To develop a comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon (PASCLex) from clinical notes to support PASC symptom identification and research. METHODS: We identified 26,117 COVID-19 positive patients from the Mass General Brigham’s electronic health records (EHR) and...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Liqin, Foer, Dinah, MacPhaul, Erin, Lo, Ying-Chih, Bates, David W., Zhou, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590503/
https://www.ncbi.nlm.nih.gov/pubmed/34785382
http://dx.doi.org/10.1016/j.jbi.2021.103951
_version_ 1784598982958252032
author Wang, Liqin
Foer, Dinah
MacPhaul, Erin
Lo, Ying-Chih
Bates, David W.
Zhou, Li
author_facet Wang, Liqin
Foer, Dinah
MacPhaul, Erin
Lo, Ying-Chih
Bates, David W.
Zhou, Li
author_sort Wang, Liqin
collection PubMed
description OBJECTIVE: To develop a comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon (PASCLex) from clinical notes to support PASC symptom identification and research. METHODS: We identified 26,117 COVID-19 positive patients from the Mass General Brigham’s electronic health records (EHR) and extracted 328,879 clinical notes from their post-acute infection period (day 51–110 from first positive COVID-19 test). PASCLex incorporated Unified Medical Language System® (UMLS) Metathesaurus concepts and synonyms based on selected semantic types. The MTERMS natural language processing (NLP) tool was used to automatically extract symptoms from a development dataset. The lexicon was iteratively revised with manual chart review, keyword search, concept consolidation, and evaluation of NLP output. We assessed the comprehensiveness of PASCLex and the NLP performance using a validation dataset and reported the symptom prevalence across the entire corpus. RESULTS: PASCLex included 355 symptoms consolidated from 1520 UMLS concepts of 16,466 synonyms. NLP achieved an averaged precision of 0.94 and an estimated recall of 0.84. Symptoms with the highest frequency included pain (43.1%), anxiety (25.8%), depression (24.0%), fatigue (23.4%), joint pain (21.0%), shortness of breath (20.8%), headache (20.0%), nausea and/or vomiting (19.9%), myalgia (19.0%), and gastroesophageal reflux (18.6%). DISCUSSION AND CONCLUSION: PASC symptoms are diverse. A comprehensive lexicon of PASC symptoms can be derived using an ontology-driven, EHR-guided and NLP-assisted approach. By using unstructured data, this approach may improve identification and analysis of patient symptoms in the EHR, and inform prospective study design, preventative care strategies, and therapeutic interventions for patient care.
format Online
Article
Text
id pubmed-8590503
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-85905032021-11-15 PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes Wang, Liqin Foer, Dinah MacPhaul, Erin Lo, Ying-Chih Bates, David W. Zhou, Li J Biomed Inform Original Research OBJECTIVE: To develop a comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon (PASCLex) from clinical notes to support PASC symptom identification and research. METHODS: We identified 26,117 COVID-19 positive patients from the Mass General Brigham’s electronic health records (EHR) and extracted 328,879 clinical notes from their post-acute infection period (day 51–110 from first positive COVID-19 test). PASCLex incorporated Unified Medical Language System® (UMLS) Metathesaurus concepts and synonyms based on selected semantic types. The MTERMS natural language processing (NLP) tool was used to automatically extract symptoms from a development dataset. The lexicon was iteratively revised with manual chart review, keyword search, concept consolidation, and evaluation of NLP output. We assessed the comprehensiveness of PASCLex and the NLP performance using a validation dataset and reported the symptom prevalence across the entire corpus. RESULTS: PASCLex included 355 symptoms consolidated from 1520 UMLS concepts of 16,466 synonyms. NLP achieved an averaged precision of 0.94 and an estimated recall of 0.84. Symptoms with the highest frequency included pain (43.1%), anxiety (25.8%), depression (24.0%), fatigue (23.4%), joint pain (21.0%), shortness of breath (20.8%), headache (20.0%), nausea and/or vomiting (19.9%), myalgia (19.0%), and gastroesophageal reflux (18.6%). DISCUSSION AND CONCLUSION: PASC symptoms are diverse. A comprehensive lexicon of PASC symptoms can be derived using an ontology-driven, EHR-guided and NLP-assisted approach. By using unstructured data, this approach may improve identification and analysis of patient symptoms in the EHR, and inform prospective study design, preventative care strategies, and therapeutic interventions for patient care. Elsevier Inc. 2022-01 2021-11-13 /pmc/articles/PMC8590503/ /pubmed/34785382 http://dx.doi.org/10.1016/j.jbi.2021.103951 Text en © 2021 Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Original Research
Wang, Liqin
Foer, Dinah
MacPhaul, Erin
Lo, Ying-Chih
Bates, David W.
Zhou, Li
PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title_full PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title_fullStr PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title_full_unstemmed PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title_short PASCLex: A comprehensive post-acute sequelae of COVID-19 (PASC) symptom lexicon derived from electronic health record clinical notes
title_sort pasclex: a comprehensive post-acute sequelae of covid-19 (pasc) symptom lexicon derived from electronic health record clinical notes
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8590503/
https://www.ncbi.nlm.nih.gov/pubmed/34785382
http://dx.doi.org/10.1016/j.jbi.2021.103951
work_keys_str_mv AT wangliqin pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes
AT foerdinah pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes
AT macphaulerin pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes
AT loyingchih pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes
AT batesdavidw pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes
AT zhouli pasclexacomprehensivepostacutesequelaeofcovid19pascsymptomlexiconderivedfromelectronichealthrecordclinicalnotes