Cargando…

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care

OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preproces...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mahendra, Malini, Luo, Yanting, Mills, Hunter, Schenk, Gundolf, Butte, Atul J., Dudley, R. Adams
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Lippincott Williams & Wilkins 2021
Materias:	Original Clinical Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8202578/ https://www.ncbi.nlm.nih.gov/pubmed/34136824 http://dx.doi.org/10.1097/CCE.0000000000000450

_version_	1783708011387158528
author	Mahendra, Malini Luo, Yanting Mills, Hunter Schenk, Gundolf Butte, Atul J. Dudley, R. Adams
author_facet	Mahendra, Malini Luo, Yanting Mills, Hunter Schenk, Gundolf Butte, Atul J. Dudley, R. Adams
author_sort	Mahendra, Malini
collection	PubMed
description	OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data. SETTING: ICUs at University of California San Francisco and Beth Israel Deaconess Medical Center. SUBJECTS: Ten thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Mortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80–0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85–0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83). CONCLUSIONS: Differences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting.
format	Online Article Text
id	pubmed-8202578
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Lippincott Williams & Wilkins
record_format	MEDLINE/PubMed
spelling	pubmed-82025782021-06-15 Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care Mahendra, Malini Luo, Yanting Mills, Hunter Schenk, Gundolf Butte, Atul J. Dudley, R. Adams Crit Care Explor Original Clinical Report OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data. SETTING: ICUs at University of California San Francisco and Beth Israel Deaconess Medical Center. SUBJECTS: Ten thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Mortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80–0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85–0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83). CONCLUSIONS: Differences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting. Lippincott Williams & Wilkins 2021-06-11 /pmc/articles/PMC8202578/ /pubmed/34136824 http://dx.doi.org/10.1097/CCE.0000000000000450 Text en Copyright © 2021 The Authors. Published by Wolters Kluwer Health, Inc. on behalf of the Society of Critical Care Medicine. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) , where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
spellingShingle	Original Clinical Report Mahendra, Malini Luo, Yanting Mills, Hunter Schenk, Gundolf Butte, Atul J. Dudley, R. Adams Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title	Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title_full	Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title_fullStr	Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title_full_unstemmed	Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title_short	Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care
title_sort	impact of different approaches to preparing notes for analysis with natural language processing on the performance of prediction models in intensive care
topic	Original Clinical Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8202578/ https://www.ncbi.nlm.nih.gov/pubmed/34136824 http://dx.doi.org/10.1097/CCE.0000000000000450
work_keys_str_mv	AT mahendramalini impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare AT luoyanting impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare AT millshunter impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare AT schenkgundolf impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare AT butteatulj impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare AT dudleyradams impactofdifferentapproachestopreparingnotesforanalysiswithnaturallanguageprocessingontheperformanceofpredictionmodelsinintensivecare

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care

Ejemplares similares