Cargando…

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

BACKGROUND: Timely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitori...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sung, Sheng-Feng, Sung, Kuan-Lin, Pan, Ru-Chiou, Lee, Pei-Ju, Hu, Ya-Han
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Cardiovascular Medicine
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372298/ https://www.ncbi.nlm.nih.gov/pubmed/35966534 http://dx.doi.org/10.3389/fcvm.2022.941237

_version_	1784767349869510656
author	Sung, Sheng-Feng Sung, Kuan-Lin Pan, Ru-Chiou Lee, Pei-Ju Hu, Ya-Han
author_facet	Sung, Sheng-Feng Sung, Kuan-Lin Pan, Ru-Chiou Lee, Pei-Ju Hu, Ya-Han
author_sort	Sung, Sheng-Feng
collection	PubMed
description	BACKGROUND: Timely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke. METHODS: Linked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores. RESULTS: The study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores. CONCLUSIONS: It is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.
format	Online Article Text
id	pubmed-9372298
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-93722982022-08-13 Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing Sung, Sheng-Feng Sung, Kuan-Lin Pan, Ru-Chiou Lee, Pei-Ju Hu, Ya-Han Front Cardiovasc Med Cardiovascular Medicine BACKGROUND: Timely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke. METHODS: Linked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores. RESULTS: The study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores. CONCLUSIONS: It is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model. Frontiers Media S.A. 2022-07-29 /pmc/articles/PMC9372298/ /pubmed/35966534 http://dx.doi.org/10.3389/fcvm.2022.941237 Text en Copyright © 2022 Sung, Sung, Pan, Lee and Hu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Cardiovascular Medicine Sung, Sheng-Feng Sung, Kuan-Lin Pan, Ru-Chiou Lee, Pei-Ju Hu, Ya-Han Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title	Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title_full	Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title_fullStr	Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title_full_unstemmed	Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title_short	Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
title_sort	automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing
topic	Cardiovascular Medicine
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372298/ https://www.ncbi.nlm.nih.gov/pubmed/35966534 http://dx.doi.org/10.3389/fcvm.2022.941237
work_keys_str_mv	AT sungshengfeng automatedriskassessmentofnewlydetectedatrialfibrillationpoststrokefromelectronichealthrecorddatausingmachinelearningandnaturallanguageprocessing AT sungkuanlin automatedriskassessmentofnewlydetectedatrialfibrillationpoststrokefromelectronichealthrecorddatausingmachinelearningandnaturallanguageprocessing AT panruchiou automatedriskassessmentofnewlydetectedatrialfibrillationpoststrokefromelectronichealthrecorddatausingmachinelearningandnaturallanguageprocessing AT leepeiju automatedriskassessmentofnewlydetectedatrialfibrillationpoststrokefromelectronichealthrecorddatausingmachinelearningandnaturallanguageprocessing AT huyahan automatedriskassessmentofnewlydetectedatrialfibrillationpoststrokefromelectronichealthrecorddatausingmachinelearningandnaturallanguageprocessing

Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

Ejemplares similares