Cargando…

Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada

The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients...

Descripción completa

Detalles Bibliográficos
Autores principales: Meaney, Christopher, Moineddin, Rahim, Kalia, Sumeet, Aliarzadeh, Babak, Greiver, Michelle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931279/
https://www.ncbi.nlm.nih.gov/pubmed/36812606
http://dx.doi.org/10.1371/journal.pdig.0000150
_version_ 1784889214519738368
author Meaney, Christopher
Moineddin, Rahim
Kalia, Sumeet
Aliarzadeh, Babak
Greiver, Michelle
author_facet Meaney, Christopher
Moineddin, Rahim
Kalia, Sumeet
Aliarzadeh, Babak
Greiver, Michelle
author_sort Meaney, Christopher
collection PubMed
description The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients with a clinical encounter between January 1, 2020 and December 31, 2020 at one of 44 participating clinical sites. During the study timeframe, Toronto first experienced a COVID-19 outbreak between March-2020 and June-2020; followed by a second viral resurgence from October-2020 through December-2020. We used an expert derived dictionary, pattern matching tools and contextual analyzer to classify primary care documents as 1) COVID-19 positive, 2) COVID-19 negative, or 3) unknown COVID-19 status. We applied the COVID-19 biosurveillance system across three primary care electronic medical record text streams: 1) lab text, 2) health condition diagnosis text and 3) clinical notes. We enumerated COVID-19 entities in the clinical text and estimated the proportion of patients with a positive COVID-19 record. We constructed a primary care COVID-19 NLP-derived time series and investigated its correlation with independent/external public health series: 1) lab confirmed COVID-19 cases, 2) COVID-19 hospitalizations, 3) COVID-19 ICU admissions, and 4) COVID-19 intubations. A total of 196,440 unique patients were observed over the study timeframe, of which 4,580 (2.3%) had at least one positive COVID-19 document in their primary care electronic medical record. Our NLP-derived COVID-19 time series describing the temporal dynamics of COVID-19 positivity status over the study timeframe demonstrated a pattern/trend which strongly mirrored that of other external public health series under investigation. We conclude that primary care text data passively collected from electronic medical record systems represent a high quality, low-cost source of information for monitoring/surveilling COVID-19 impacts on community health.
format Online
Article
Text
id pubmed-9931279
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99312792023-02-16 Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada Meaney, Christopher Moineddin, Rahim Kalia, Sumeet Aliarzadeh, Babak Greiver, Michelle PLOS Digit Health Research Article The objective of this study was to investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, could be used to monitor COVID-19 viral activity in Toronto, Canada. We employed a retrospective cohort design. We included primary care patients with a clinical encounter between January 1, 2020 and December 31, 2020 at one of 44 participating clinical sites. During the study timeframe, Toronto first experienced a COVID-19 outbreak between March-2020 and June-2020; followed by a second viral resurgence from October-2020 through December-2020. We used an expert derived dictionary, pattern matching tools and contextual analyzer to classify primary care documents as 1) COVID-19 positive, 2) COVID-19 negative, or 3) unknown COVID-19 status. We applied the COVID-19 biosurveillance system across three primary care electronic medical record text streams: 1) lab text, 2) health condition diagnosis text and 3) clinical notes. We enumerated COVID-19 entities in the clinical text and estimated the proportion of patients with a positive COVID-19 record. We constructed a primary care COVID-19 NLP-derived time series and investigated its correlation with independent/external public health series: 1) lab confirmed COVID-19 cases, 2) COVID-19 hospitalizations, 3) COVID-19 ICU admissions, and 4) COVID-19 intubations. A total of 196,440 unique patients were observed over the study timeframe, of which 4,580 (2.3%) had at least one positive COVID-19 document in their primary care electronic medical record. Our NLP-derived COVID-19 time series describing the temporal dynamics of COVID-19 positivity status over the study timeframe demonstrated a pattern/trend which strongly mirrored that of other external public health series under investigation. We conclude that primary care text data passively collected from electronic medical record systems represent a high quality, low-cost source of information for monitoring/surveilling COVID-19 impacts on community health. Public Library of Science 2022-12-07 /pmc/articles/PMC9931279/ /pubmed/36812606 http://dx.doi.org/10.1371/journal.pdig.0000150 Text en © 2022 Meaney et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Meaney, Christopher
Moineddin, Rahim
Kalia, Sumeet
Aliarzadeh, Babak
Greiver, Michelle
Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title_full Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title_fullStr Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title_full_unstemmed Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title_short Using Primary Care Clinical Text Data and Natural Language Processing to Identify Indicators of COVID-19 in Toronto, Canada
title_sort using primary care clinical text data and natural language processing to identify indicators of covid-19 in toronto, canada
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931279/
https://www.ncbi.nlm.nih.gov/pubmed/36812606
http://dx.doi.org/10.1371/journal.pdig.0000150
work_keys_str_mv AT meaneychristopher usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada
AT moineddinrahim usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada
AT kaliasumeet usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada
AT aliarzadehbabak usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada
AT greivermichelle usingprimarycareclinicaltextdataandnaturallanguageprocessingtoidentifyindicatorsofcovid19intorontocanada