Cargando…
A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital
OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations deri...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/ https://www.ncbi.nlm.nih.gov/pubmed/37425487 http://dx.doi.org/10.1093/jamiaopen/ooad047 |
_version_ | 1785068804277010432 |
---|---|
author | Wang, Lijing Zipursky, Amy R Geva, Alon McMurry, Andrew J Mandl, Kenneth D Miller, Timothy A |
author_facet | Wang, Lijing Zipursky, Amy R Geva, Alon McMurry, Andrew J Mandl, Kenneth D Miller, Timothy A |
author_sort | Wang, Lijing |
collection | PubMed |
description | OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. RESULTS: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. DISCUSSION: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. CONCLUSION: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts. |
format | Online Article Text |
id | pubmed-10322650 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103226502023-07-07 A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital Wang, Lijing Zipursky, Amy R Geva, Alon McMurry, Andrew J Mandl, Kenneth D Miller, Timothy A JAMIA Open Research and Applications OBJECTIVE: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR). MATERIALS AND METHODS: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier. RESULTS: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19. DISCUSSION: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned. CONCLUSION: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts. Oxford University Press 2023-07-05 /pmc/articles/PMC10322650/ /pubmed/37425487 http://dx.doi.org/10.1093/jamiaopen/ooad047 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research and Applications Wang, Lijing Zipursky, Amy R Geva, Alon McMurry, Andrew J Mandl, Kenneth D Miller, Timothy A A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title | A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title_full | A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title_fullStr | A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title_full_unstemmed | A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title_short | A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital |
title_sort | computable case definition for patients with sars-cov2 testing that occurred outside the hospital |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10322650/ https://www.ncbi.nlm.nih.gov/pubmed/37425487 http://dx.doi.org/10.1093/jamiaopen/ooad047 |
work_keys_str_mv | AT wanglijing acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT zipurskyamyr acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT gevaalon acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT mcmurryandrewj acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT mandlkennethd acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT millertimothya acomputablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT wanglijing computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT zipurskyamyr computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT gevaalon computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT mcmurryandrewj computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT mandlkennethd computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital AT millertimothya computablecasedefinitionforpatientswithsarscov2testingthatoccurredoutsidethehospital |