Cargando…

2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients

OBJECTIVES/SPECIFIC AIMS: An accurate method to identify bleeding in large populations does not exist. Our goal was to explore bleeding representation in clinical text in order to develop a natural language processing (NLP) approach to automatically identify bleeding events from clinical notes. METH...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shah, Rashmee, Steinberg, Benjamin, Bucher, Brian, Chapman, Alec, Lloyd-Jones, Donald, Rondina, Matthew, Chapman, Wendy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2018
Materias:	Biomedical Informatics/Health Informatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6799302/ http://dx.doi.org/10.1017/cts.2017.60

_version_	1783460254607998976
author	Shah, Rashmee Steinberg, Benjamin Bucher, Brian Chapman, Alec Lloyd-Jones, Donald Rondina, Matthew Chapman, Wendy
author_facet	Shah, Rashmee Steinberg, Benjamin Bucher, Brian Chapman, Alec Lloyd-Jones, Donald Rondina, Matthew Chapman, Wendy
author_sort	Shah, Rashmee
collection	PubMed
description	OBJECTIVES/SPECIFIC AIMS: An accurate method to identify bleeding in large populations does not exist. Our goal was to explore bleeding representation in clinical text in order to develop a natural language processing (NLP) approach to automatically identify bleeding events from clinical notes. METHODS/STUDY POPULATION: We used publicly available notes for ICU patients at high risk of bleeding (n=98,586 notes). Two physicians reviewed randomly selected notes and annotated all direct references to bleeding as “bleeding present” (BP) or “bleeding absent” (BA). Annotations were made at the mention level (if 1 specific sentence/phrase indicated BP or BA) and note level (if overall note indicated BP or BA). A third physician adjudicated discordant annotations. RESULTS/ANTICIPATED RESULTS: In 120 randomly selected notes, bleeding was mentioned 406 times with 76 distinct words. Inter-annotator agreement was 89% by the last batch of 30 notes. In total, 10 terms accounted for 65% of all bleeding mentions. We aggregated these results into 16 common stems (eg, “hemorr” for hemorrhagic and hemorrhage), which accounted for 90% of all 406 mentions. Of all 120 notes, 60% were classified as BP. The median number of stems was 5 (IQR 2, 9) in BP Versus 0 (IQR 0, 1) in BA notes. Zero bleeding mentions in a note was associated with BA (OR 28, 95% CI 6.5, 127). With 40 true negatives and 2 false negatives, the negative predictive value (NPV) of zero bleeding mentions was 95%. DISCUSSION/SIGNIFICANCE OF IMPACT: Few bleeding-related terms are used in clinical practice. Absence of these terms has a high NPV for the absence of bleeding. These results suggest that a high throughput, rules-based NLP tool to identify bleeding is feasible.
format	Online Article Text
id	pubmed-6799302
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-67993022019-10-28 2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients Shah, Rashmee Steinberg, Benjamin Bucher, Brian Chapman, Alec Lloyd-Jones, Donald Rondina, Matthew Chapman, Wendy J Clin Transl Sci Biomedical Informatics/Health Informatics OBJECTIVES/SPECIFIC AIMS: An accurate method to identify bleeding in large populations does not exist. Our goal was to explore bleeding representation in clinical text in order to develop a natural language processing (NLP) approach to automatically identify bleeding events from clinical notes. METHODS/STUDY POPULATION: We used publicly available notes for ICU patients at high risk of bleeding (n=98,586 notes). Two physicians reviewed randomly selected notes and annotated all direct references to bleeding as “bleeding present” (BP) or “bleeding absent” (BA). Annotations were made at the mention level (if 1 specific sentence/phrase indicated BP or BA) and note level (if overall note indicated BP or BA). A third physician adjudicated discordant annotations. RESULTS/ANTICIPATED RESULTS: In 120 randomly selected notes, bleeding was mentioned 406 times with 76 distinct words. Inter-annotator agreement was 89% by the last batch of 30 notes. In total, 10 terms accounted for 65% of all bleeding mentions. We aggregated these results into 16 common stems (eg, “hemorr” for hemorrhagic and hemorrhage), which accounted for 90% of all 406 mentions. Of all 120 notes, 60% were classified as BP. The median number of stems was 5 (IQR 2, 9) in BP Versus 0 (IQR 0, 1) in BA notes. Zero bleeding mentions in a note was associated with BA (OR 28, 95% CI 6.5, 127). With 40 true negatives and 2 false negatives, the negative predictive value (NPV) of zero bleeding mentions was 95%. DISCUSSION/SIGNIFICANCE OF IMPACT: Few bleeding-related terms are used in clinical practice. Absence of these terms has a high NPV for the absence of bleeding. These results suggest that a high throughput, rules-based NLP tool to identify bleeding is feasible. Cambridge University Press 2018-05-10 /pmc/articles/PMC6799302/ http://dx.doi.org/10.1017/cts.2017.60 Text en © The Association for Clinical and Translational Science 2018 http://creativecommons.org/licenses/by/4.0/ This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Biomedical Informatics/Health Informatics Shah, Rashmee Steinberg, Benjamin Bucher, Brian Chapman, Alec Lloyd-Jones, Donald Rondina, Matthew Chapman, Wendy 2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title	2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title_full	2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title_fullStr	2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title_full_unstemmed	2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title_short	2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
title_sort	2182: developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
topic	Biomedical Informatics/Health Informatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6799302/ http://dx.doi.org/10.1017/cts.2017.60
work_keys_str_mv	AT shahrashmee 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT steinbergbenjamin 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT bucherbrian 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT chapmanalec 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT lloydjonesdonald 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT rondinamatthew 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients AT chapmanwendy 2182developingacorpusfornaturallanguageprocessingtoidentifybleedingcomplicationsamongintensivecareunitpatients

2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients

Ejemplares similares