Cargando…

1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance

BACKGROUND: Automated surveillance for infectious disease syndromes (IDS) in hospitals mostly relies on structured data (e.g., diagnosis codes). Natural language processing (NLP) enables screening and concept extraction from large repositories of unstructured data (e.g., clinician notes). We demonst...

Descripción completa

Detalles Bibliográficos
Autores principales: Zachariah, Philip, Hill-Ricciuti, Alexandra, Natarajan, Karthik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6253153/
http://dx.doi.org/10.1093/ofid/ofy209.150
_version_ 1783373431806361600
author Zachariah, Philip
Hill-Ricciuti, Alexandra
Natarajan, Karthik
author_facet Zachariah, Philip
Hill-Ricciuti, Alexandra
Natarajan, Karthik
author_sort Zachariah, Philip
collection PubMed
description BACKGROUND: Automated surveillance for infectious disease syndromes (IDS) in hospitals mostly relies on structured data (e.g., diagnosis codes). Natural language processing (NLP) enables screening and concept extraction from large repositories of unstructured data (e.g., clinician notes). We demonstrate the use of an NLP-based pipeline to improve case finding for a specific IDS (urinary tract infection [UTI]) and compare this to surveillance using ICD-10 codes. METHODS: Inpatient hospitalizations in 2016 with ICD-10 codes for UTI at a children’s hospital were identified. Records of inpatients with positive urine cultures for 2016 were reviewed to identify missed cases. Notes for inpatient hospitalizations for 2016 were processed using an NLP pipeline. The NLP pipeline receives real-time data, accounts for institution-specific document structure, performs named-entity recognition on clinical problems/symptoms, and matches these terms to concept unique identifiers (CUI) in the unified medical language system (UMLS). We used the UMLS CUI for urinary tract infections (C0042029) to identify notes of interest. To minimize false positives, we selected as the threshold for case positivity—the mean UTI CUI mentions per patient during 2016. RESULTS: Among 10,681 hospitalized patients, there were 181 unique patients that were identified with UTI using ICD-10 codes. An additional 85 UTI cases were identified using chart review of positive urine cultures (n = 409). A total of 289,344 notes were screened by the NLP pipeline to identify UTI patients. Using the predefined threshold (n = 6), all cases of UTI identified by ICD-10 screening were detected by the NLP-based method. Of the additional cases missed by ICD-10 codes, 84 of 85 (98.9%) were positive by the NLP-based method. To identify these 84 true cases, an additional 275 charts without UTI, flagged as positive by the NLP method, would have to be reviewed (ratio of ~1:3). CONCLUSION: We demonstrate the use of an NLP-based pipeline to enhance IDS surveillance. Using NLP-based surveillance with other methods could facilitate case detection and outbreak control for IDS that lack microbiologic data or have novel presentations. Further work will improve the specificity of NLP-based case finding methods and apply this to other IDS. DISCLOSURES: All authors: No reported disclosures.
format Online
Article
Text
id pubmed-6253153
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62531532018-11-28 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance Zachariah, Philip Hill-Ricciuti, Alexandra Natarajan, Karthik Open Forum Infect Dis Abstracts BACKGROUND: Automated surveillance for infectious disease syndromes (IDS) in hospitals mostly relies on structured data (e.g., diagnosis codes). Natural language processing (NLP) enables screening and concept extraction from large repositories of unstructured data (e.g., clinician notes). We demonstrate the use of an NLP-based pipeline to improve case finding for a specific IDS (urinary tract infection [UTI]) and compare this to surveillance using ICD-10 codes. METHODS: Inpatient hospitalizations in 2016 with ICD-10 codes for UTI at a children’s hospital were identified. Records of inpatients with positive urine cultures for 2016 were reviewed to identify missed cases. Notes for inpatient hospitalizations for 2016 were processed using an NLP pipeline. The NLP pipeline receives real-time data, accounts for institution-specific document structure, performs named-entity recognition on clinical problems/symptoms, and matches these terms to concept unique identifiers (CUI) in the unified medical language system (UMLS). We used the UMLS CUI for urinary tract infections (C0042029) to identify notes of interest. To minimize false positives, we selected as the threshold for case positivity—the mean UTI CUI mentions per patient during 2016. RESULTS: Among 10,681 hospitalized patients, there were 181 unique patients that were identified with UTI using ICD-10 codes. An additional 85 UTI cases were identified using chart review of positive urine cultures (n = 409). A total of 289,344 notes were screened by the NLP pipeline to identify UTI patients. Using the predefined threshold (n = 6), all cases of UTI identified by ICD-10 screening were detected by the NLP-based method. Of the additional cases missed by ICD-10 codes, 84 of 85 (98.9%) were positive by the NLP-based method. To identify these 84 true cases, an additional 275 charts without UTI, flagged as positive by the NLP method, would have to be reviewed (ratio of ~1:3). CONCLUSION: We demonstrate the use of an NLP-based pipeline to enhance IDS surveillance. Using NLP-based surveillance with other methods could facilitate case detection and outbreak control for IDS that lack microbiologic data or have novel presentations. Further work will improve the specificity of NLP-based case finding methods and apply this to other IDS. DISCLOSURES: All authors: No reported disclosures. Oxford University Press 2018-11-26 /pmc/articles/PMC6253153/ http://dx.doi.org/10.1093/ofid/ofy209.150 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Infectious Diseases Society of America. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Abstracts
Zachariah, Philip
Hill-Ricciuti, Alexandra
Natarajan, Karthik
1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title_full 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title_fullStr 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title_full_unstemmed 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title_short 1765. Use of a Natural Language Processing-Based Informatics Pipeline for Infectious Disease Syndrome Surveillance
title_sort 1765. use of a natural language processing-based informatics pipeline for infectious disease syndrome surveillance
topic Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6253153/
http://dx.doi.org/10.1093/ofid/ofy209.150
work_keys_str_mv AT zachariahphilip 1765useofanaturallanguageprocessingbasedinformaticspipelineforinfectiousdiseasesyndromesurveillance
AT hillricciutialexandra 1765useofanaturallanguageprocessingbasedinformaticspipelineforinfectiousdiseasesyndromesurveillance
AT natarajankarthik 1765useofanaturallanguageprocessingbasedinformaticspipelineforinfectiousdiseasesyndromesurveillance