Cargando…

Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients

IMPORTANCE: To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. OBJECTIVE: To develop and compare...

Descripción completa

Detalles Bibliográficos
Autores principales:	Taggart, Maxwell, Chapman, Wendy W., Steinberg, Benjamin A., Ruckel, Shane, Pregenzer-Wenzler, Arianna, Du, Yishuai, Ferraro, Jeffrey, Bucher, Brian T., Lloyd-Jones, Donald M., Rondina, Matthew T., Shah, Rashmee U.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Association 2018
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324448/ https://www.ncbi.nlm.nih.gov/pubmed/30646240 http://dx.doi.org/10.1001/jamanetworkopen.2018.3451

_version_	1783385974995156992
author	Taggart, Maxwell Chapman, Wendy W. Steinberg, Benjamin A. Ruckel, Shane Pregenzer-Wenzler, Arianna Du, Yishuai Ferraro, Jeffrey Bucher, Brian T. Lloyd-Jones, Donald M. Rondina, Matthew T. Shah, Rashmee U.
author_facet	Taggart, Maxwell Chapman, Wendy W. Steinberg, Benjamin A. Ruckel, Shane Pregenzer-Wenzler, Arianna Du, Yishuai Ferraro, Jeffrey Bucher, Brian T. Lloyd-Jones, Donald M. Rondina, Matthew T. Shah, Rashmee U.
author_sort	Taggart, Maxwell
collection	PubMed
description	IMPORTANCE: To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. OBJECTIVE: To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency–inverse document frequency vectors and global vectors for word representation. MAIN OUTCOMES AND MEASURES: The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test. RESULTS: The 990-note training set represented 769 patients (296 [38.5%] female; mean [SD] age, 67.42 [14.7] years). The 660-note test set represented 527 patients (211 [40.0%] female; mean [SD] age, 67.86 [14.7] years). Bleeding was present in 146 notes (22.1%). The extra trees down-sampled model and rules-based approaches were similarly sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, −3.8% to 7.9%; P = .44). The positive predictive value for the extra trees model, however, was 48.6%. The rules-based model had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. CONCLUSIONS AND RELEVANCE: Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method. The rules-based natural language processing approach, compared with ML, had the best performance in identifying bleeding, with high sensitivity and negative predictive value.
format	Online Article Text
id	pubmed-6324448
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	American Medical Association
record_format	MEDLINE/PubMed
spelling	pubmed-63244482019-01-22 Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients Taggart, Maxwell Chapman, Wendy W. Steinberg, Benjamin A. Ruckel, Shane Pregenzer-Wenzler, Arianna Du, Yishuai Ferraro, Jeffrey Bucher, Brian T. Lloyd-Jones, Donald M. Rondina, Matthew T. Shah, Rashmee U. JAMA Netw Open Original Investigation IMPORTANCE: To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. OBJECTIVE: To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency–inverse document frequency vectors and global vectors for word representation. MAIN OUTCOMES AND MEASURES: The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test. RESULTS: The 990-note training set represented 769 patients (296 [38.5%] female; mean [SD] age, 67.42 [14.7] years). The 660-note test set represented 527 patients (211 [40.0%] female; mean [SD] age, 67.86 [14.7] years). Bleeding was present in 146 notes (22.1%). The extra trees down-sampled model and rules-based approaches were similarly sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, −3.8% to 7.9%; P = .44). The positive predictive value for the extra trees model, however, was 48.6%. The rules-based model had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. CONCLUSIONS AND RELEVANCE: Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method. The rules-based natural language processing approach, compared with ML, had the best performance in identifying bleeding, with high sensitivity and negative predictive value. American Medical Association 2018-10-12 /pmc/articles/PMC6324448/ /pubmed/30646240 http://dx.doi.org/10.1001/jamanetworkopen.2018.3451 Text en Copyright 2018 Taggart M et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle	Original Investigation Taggart, Maxwell Chapman, Wendy W. Steinberg, Benjamin A. Ruckel, Shane Pregenzer-Wenzler, Arianna Du, Yishuai Ferraro, Jeffrey Bucher, Brian T. Lloyd-Jones, Donald M. Rondina, Matthew T. Shah, Rashmee U. Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title	Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title_full	Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title_fullStr	Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title_full_unstemmed	Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title_short	Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients
title_sort	comparison of 2 natural language processing methods for identification of bleeding among critically ill patients
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6324448/ https://www.ncbi.nlm.nih.gov/pubmed/30646240 http://dx.doi.org/10.1001/jamanetworkopen.2018.3451
work_keys_str_mv	AT taggartmaxwell comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT chapmanwendyw comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT steinbergbenjamina comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT ruckelshane comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT pregenzerwenzlerarianna comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT duyishuai comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT ferrarojeffrey comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT bucherbriant comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT lloydjonesdonaldm comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT rondinamatthewt comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients AT shahrashmeeu comparisonof2naturallanguageprocessingmethodsforidentificationofbleedingamongcriticallyillpatients

Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients

Ejemplares similares