Cargando…

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations deri...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Lijing, Zipursky, Amy R, Geva, Alon, McMurry, Andrew J, Mandl, Kenneth D, Miller, Timothy A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/
https://www.ncbi.nlm.nih.gov/pubmed/37425487
http://dx.doi.org/10.1093/jamiaopen/ooad047
_version_ 1785068804277010432
author Wang, Lijing
Zipursky, Amy R
Geva, Alon
McMurry, Andrew J
Mandl, Kenneth D
Miller, Timothy A
author_facet Wang, Lijing
Zipursky, Amy R
Geva, Alon
McMurry, Andrew J
Mandl, Kenneth D
Miller, Timothy A
author_sort Wang, Lijing
collection PubMed
description OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. RESULTS: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. DISCUSSION: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. CONCLUSION: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.
format Online
Article
Text
id pubmed-10322650
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103226502023-07-07 A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital Wang, Lijing Zipursky, Amy R Geva, Alon McMurry, Andrew J Mandl, Kenneth D Miller, Timothy A JAMIA Open Research and Applications OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. RESULTS: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. DISCUSSION: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. CONCLUSION: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts. Oxford University Press 2023-07-05 /pmc/articles/PMC10322650/ /pubmed/37425487 http://dx.doi.org/10.1093/jamiaopen/ooad047 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Wang, Lijing
Zipursky, Amy R
Geva, Alon
McMurry, Andrew J
Mandl, Kenneth D
Miller, Timothy A
A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title_full A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title_fullStr A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title_full_unstemmed A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title_short A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
title_sort computable case definition for patients with sars-cov2 testing that occurred outside the hospital
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/
https://www.ncbi.nlm.nih.gov/pubmed/37425487
http://dx.doi.org/10.1093/jamiaopen/ooad047
work_keys_str_mv AT wanglijing acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT zipurskyamyr acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT gevaalon acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT mcmurryandrewj acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT mandlkennethd acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT millertimothya acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT wanglijing computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT zipurskyamyr computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT gevaalon computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT mcmurryandrewj computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT mandlkennethd computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital
AT millertimothya computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital