Cargando…

Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation

BACKGROUND: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rath...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Yiqing, Fu, Sunyang, Bielinski, Suzette J, Decker, Paul A, Chamberlain, Alanna M, Roger, Veronique L, Liu, Hongfang, Larson, Nicholas B
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7985804/ https://www.ncbi.nlm.nih.gov/pubmed/33683212 http://dx.doi.org/10.2196/22951

_version_	1783668326492274688
author	Zhao, Yiqing Fu, Sunyang Bielinski, Suzette J Decker, Paul A Chamberlain, Alanna M Roger, Veronique L Liu, Hongfang Larson, Nicholas B
author_facet	Zhao, Yiqing Fu, Sunyang Bielinski, Suzette J Decker, Paul A Chamberlain, Alanna M Roger, Veronique L Liu, Hongfang Larson, Nicholas B
author_sort	Zhao, Yiqing
collection	PubMed
description	BACKGROUND: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. OBJECTIVE: The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. METHODS: The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). RESULTS: Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. CONCLUSIONS: We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
format	Online Article Text
id	pubmed-7985804
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-79858042021-05-07 Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation Zhao, Yiqing Fu, Sunyang Bielinski, Suzette J Decker, Paul A Chamberlain, Alanna M Roger, Veronique L Liu, Hongfang Larson, Nicholas B J Med Internet Res Original Paper BACKGROUND: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. OBJECTIVE: The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. METHODS: The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). RESULTS: Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. CONCLUSIONS: We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions. JMIR Publications 2021-03-08 /pmc/articles/PMC7985804/ /pubmed/33683212 http://dx.doi.org/10.2196/22951 Text en ©Yiqing Zhao, Sunyang Fu, Suzette J Bielinski, Paul A Decker, Alanna M Chamberlain, Veronique L Roger, Hongfang Liu, Nicholas B Larson. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.03.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Zhao, Yiqing Fu, Sunyang Bielinski, Suzette J Decker, Paul A Chamberlain, Alanna M Roger, Veronique L Liu, Hongfang Larson, Nicholas B Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title_full	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title_fullStr	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title_full_unstemmed	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title_short	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
title_sort	natural language processing and machine learning for identifying incident stroke from electronic health records: algorithm development and validation
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7985804/ https://www.ncbi.nlm.nih.gov/pubmed/33683212 http://dx.doi.org/10.2196/22951
work_keys_str_mv	AT zhaoyiqing naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT fusunyang naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT bielinskisuzettej naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT deckerpaula naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT chamberlainalannam naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT rogerveroniquel naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT liuhongfang naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation AT larsonnicholasb naturallanguageprocessingandmachinelearningforidentifyingincidentstrokefromelectronichealthrecordsalgorithmdevelopmentandvalidation

Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation

Ejemplares similares