Cargando…
Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931279/ https://www.ncbi.nlm.nih.gov/pubmed/36812606 http://dx.doi.org/10.1371/journal.pdig.0000150 |
_version_ | 1784889214519738368 |
---|---|
author | Meaney, Christopher Moineddin, Rahim Kalia, Sumeet Aliarzadeh, Babak Greiver, Michelle |
author_facet | Meaney, Christopher Moineddin, Rahim Kalia, Sumeet Aliarzadeh, Babak Greiver, Michelle |
author_sort | Meaney, Christopher |
collection | PubMed |
description | The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients with a clinical encounter between January 1, 2020 and December 31, 2020 at one of 44 participating clinical sites. During the study timeframe, Toronto first experienced a COVID-19 outbreak between March-2020 and June-2020; followed by a second viral resurgence from October-2020 through December-2020. We used an expert derived dictionary, pattern matching tools and contextual analyzer to classify primary care documents as 1) COVID-19 positive, 2) COVID-19 negative, or 3) unknown COVID-19 status. We applied the COVID-19 biosurveillance system across three primary care electronic medical record text streams: 1) lab text, 2) health condition diagnosis text and 3) clinical notes. We enumerated COVID-19 entities in the clinical text and estimated the proportion of patients with a positive COVID-19 record. We constructed a primary care COVID-19 NLP-derived time series and investigated its correlation with independent/external public health series: 1) lab confirmed COVID-19 cases, 2) COVID-19 hospitalizations, 3) COVID-19 ICU admissions, and 4) COVID-19 intubations. A total of 196,440 unique patients were observed over the study timeframe, of which 4,580 (2.3%) had at least one positive COVID-19 document in their primary care electronic medical record. Our NLP-derived COVID-19 time series describing the temporal dynamics of COVID-19 positivity status over the study timeframe demonstrated a pattern/trend which strongly mirrored that of other external public health series under investigation. We conclude that primary care text data passively collected from electronic medical record systems represent a high quality, low-cost source of information for monitoring/surveilling COVID-19 impacts on community health. |
format | Online Article Text |
id | pubmed-9931279 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-99312792023-02-16 Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada Meaney, Christopher Moineddin, Rahim Kalia, Sumeet Aliarzadeh, Babak Greiver, Michelle PLOS Digit Health Research Article The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients with a clinical encounter between January 1, 2020 and December 31, 2020 at one of 44 participating clinical sites. During the study timeframe, Toronto first experienced a COVID-19 outbreak between March-2020 and June-2020; followed by a second viral resurgence from October-2020 through December-2020. We used an expert derived dictionary, pattern matching tools and contextual analyzer to classify primary care documents as 1) COVID-19 positive, 2) COVID-19 negative, or 3) unknown COVID-19 status. We applied the COVID-19 biosurveillance system across three primary care electronic medical record text streams: 1) lab text, 2) health condition diagnosis text and 3) clinical notes. We enumerated COVID-19 entities in the clinical text and estimated the proportion of patients with a positive COVID-19 record. We constructed a primary care COVID-19 NLP-derived time series and investigated its correlation with independent/external public health series: 1) lab confirmed COVID-19 cases, 2) COVID-19 hospitalizations, 3) COVID-19 ICU admissions, and 4) COVID-19 intubations. A total of 196,440 unique patients were observed over the study timeframe, of which 4,580 (2.3%) had at least one positive COVID-19 document in their primary care electronic medical record. Our NLP-derived COVID-19 time series describing the temporal dynamics of COVID-19 positivity status over the study timeframe demonstrated a pattern/trend which strongly mirrored that of other external public health series under investigation. We conclude that primary care text data passively collected from electronic medical record systems represent a high quality, low-cost source of information for monitoring/surveilling COVID-19 impacts on community health. Public Library of Science 2022-12-07 /pmc/articles/PMC9931279/ /pubmed/36812606 http://dx.doi.org/10.1371/journal.pdig.0000150 Text en © 2022 Meaney et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Meaney, Christopher Moineddin, Rahim Kalia, Sumeet Aliarzadeh, Babak Greiver, Michelle Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title | Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title_full | Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title_fullStr | Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title_full_unstemmed | Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title_short | Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada |
title_sort | using primary care clinical text data and natural language processing to identify indicators of covid-19 in toronto, canada |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931279/ https://www.ncbi.nlm.nih.gov/pubmed/36812606 http://dx.doi.org/10.1371/journal.pdig.0000150 |
work_keys_str_mv | AT meaneychristopher usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada AT moineddinrahim usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada AT kaliasumeet usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada AT aliarzadehbabak usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada AT greivermichelle usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada |