Cargando…

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

BACKGROUND: Identifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clini...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tsai, Hui-Chu, Hsieh, Cheng-Yang, Sung, Sheng-Feng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Public Health
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9556866/ https://www.ncbi.nlm.nih.gov/pubmed/36249261 http://dx.doi.org/10.3389/fpubh.2022.1009164

_version_	1784807170152333312
author	Tsai, Hui-Chu Hsieh, Cheng-Yang Sung, Sheng-Feng
author_facet	Tsai, Hui-Chu Hsieh, Cheng-Yang Sung, Sheng-Feng
author_sort	Tsai, Hui-Chu
collection	PubMed
description	BACKGROUND: Identifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clinical text to predict SAP by comparing it to conventional risk scores. METHODS: Linked data between a hospital stroke registry and a deidentified research-based database including electronic health records and administrative claims data was used. Natural language processing was applied to extract textual features from clinical notes. The random forest algorithm was used to build ML models. The predictive performance of ML models was compared with the A(2)DS(2), ISAN, PNA, and ACDD(4) scores using the area under the receiver operating characteristic curve (AUC). RESULTS: Among 5,913 acute stroke patients hospitalized between Oct 2010 and Sep 2021, 450 (7.6%) developed SAP within the first 7 days after stroke onset. The ML model based on both textual features and structured variables had the highest AUC [0.840, 95% confidence interval (CI) 0.806–0.875], significantly higher than those of the ML model based on structured variables alone (0.828, 95% CI 0.793–0.863, P = 0.040), ACDD(4) (0.807, 95% CI 0.766–0.849, P = 0.041), A(2)DS(2) (0.803, 95% CI 0.762–0.845, P = 0.013), ISAN (0.795, 95% CI 0.752–0.837, P = 0.009), and PNA (0.778, 95% CI 0.735–0.822, P < 0.001). All models demonstrated adequate calibration except for the A(2)DS(2) score. CONCLUSIONS: The ML model based on both textural features and structured variables performed better than conventional risk scores in predicting SAP. The workflow used to generate ML prediction models can be disseminated for local adaptation by individual healthcare organizations.
format	Online Article Text
id	pubmed-9556866
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95568662022-10-14 Application of machine learning and natural language processing for predicting stroke-associated pneumonia Tsai, Hui-Chu Hsieh, Cheng-Yang Sung, Sheng-Feng Front Public Health Public Health BACKGROUND: Identifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clinical text to predict SAP by comparing it to conventional risk scores. METHODS: Linked data between a hospital stroke registry and a deidentified research-based database including electronic health records and administrative claims data was used. Natural language processing was applied to extract textual features from clinical notes. The random forest algorithm was used to build ML models. The predictive performance of ML models was compared with the A(2)DS(2), ISAN, PNA, and ACDD(4) scores using the area under the receiver operating characteristic curve (AUC). RESULTS: Among 5,913 acute stroke patients hospitalized between Oct 2010 and Sep 2021, 450 (7.6%) developed SAP within the first 7 days after stroke onset. The ML model based on both textual features and structured variables had the highest AUC [0.840, 95% confidence interval (CI) 0.806–0.875], significantly higher than those of the ML model based on structured variables alone (0.828, 95% CI 0.793–0.863, P = 0.040), ACDD(4) (0.807, 95% CI 0.766–0.849, P = 0.041), A(2)DS(2) (0.803, 95% CI 0.762–0.845, P = 0.013), ISAN (0.795, 95% CI 0.752–0.837, P = 0.009), and PNA (0.778, 95% CI 0.735–0.822, P < 0.001). All models demonstrated adequate calibration except for the A(2)DS(2) score. CONCLUSIONS: The ML model based on both textural features and structured variables performed better than conventional risk scores in predicting SAP. The workflow used to generate ML prediction models can be disseminated for local adaptation by individual healthcare organizations. Frontiers Media S.A. 2022-09-29 /pmc/articles/PMC9556866/ /pubmed/36249261 http://dx.doi.org/10.3389/fpubh.2022.1009164 Text en Copyright © 2022 Tsai, Hsieh and Sung. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Public Health Tsai, Hui-Chu Hsieh, Cheng-Yang Sung, Sheng-Feng Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title	Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title_full	Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title_fullStr	Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title_full_unstemmed	Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title_short	Application of machine learning and natural language processing for predicting stroke-associated pneumonia
title_sort	application of machine learning and natural language processing for predicting stroke-associated pneumonia
topic	Public Health
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9556866/ https://www.ncbi.nlm.nih.gov/pubmed/36249261 http://dx.doi.org/10.3389/fpubh.2022.1009164
work_keys_str_mv	AT tsaihuichu applicationofmachinelearningandnaturallanguageprocessingforpredictingstrokeassociatedpneumonia AT hsiehchengyang applicationofmachinelearningandnaturallanguageprocessingforpredictingstrokeassociatedpneumonia AT sungshengfeng applicationofmachinelearningandnaturallanguageprocessingforpredictingstrokeassociatedpneumonia

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

Ejemplares similares