Cargando…

Assessing stroke severity using electronic health record data: a machine learning approach

BACKGROUND: Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kogan, Emily, Twyman, Kathryn, Heap, Jesse, Milentijevic, Dejan, Lin, Jennifer H., Alberts, Mark
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6950922/ https://www.ncbi.nlm.nih.gov/pubmed/31914991 http://dx.doi.org/10.1186/s12911-019-1010-x

_version_	1783486182425886720
author	Kogan, Emily Twyman, Kathryn Heap, Jesse Milentijevic, Dejan Lin, Jennifer H. Alberts, Mark
author_facet	Kogan, Emily Twyman, Kathryn Heap, Jesse Milentijevic, Dejan Lin, Jennifer H. Alberts, Mark
author_sort	Kogan, Emily
collection	PubMed
description	BACKGROUND: Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. METHODS: NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. RESULTS: Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R(2) (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. CONCLUSIONS: Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.
format	Online Article Text
id	pubmed-6950922
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69509222020-01-09 Assessing stroke severity using electronic health record data: a machine learning approach Kogan, Emily Twyman, Kathryn Heap, Jesse Milentijevic, Dejan Lin, Jennifer H. Alberts, Mark BMC Med Inform Decis Mak Research Article BACKGROUND: Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. METHODS: NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. RESULTS: Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R(2) (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. CONCLUSIONS: Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases. BioMed Central 2020-01-08 /pmc/articles/PMC6950922/ /pubmed/31914991 http://dx.doi.org/10.1186/s12911-019-1010-x Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Kogan, Emily Twyman, Kathryn Heap, Jesse Milentijevic, Dejan Lin, Jennifer H. Alberts, Mark Assessing stroke severity using electronic health record data: a machine learning approach
title	Assessing stroke severity using electronic health record data: a machine learning approach
title_full	Assessing stroke severity using electronic health record data: a machine learning approach
title_fullStr	Assessing stroke severity using electronic health record data: a machine learning approach
title_full_unstemmed	Assessing stroke severity using electronic health record data: a machine learning approach
title_short	Assessing stroke severity using electronic health record data: a machine learning approach
title_sort	assessing stroke severity using electronic health record data: a machine learning approach
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6950922/ https://www.ncbi.nlm.nih.gov/pubmed/31914991 http://dx.doi.org/10.1186/s12911-019-1010-x
work_keys_str_mv	AT koganemily assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach AT twymankathryn assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach AT heapjesse assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach AT milentijevicdejan assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach AT linjenniferh assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach AT albertsmark assessingstrokeseverityusingelectronichealthrecorddataamachinelearningapproach

Assessing stroke severity using electronic health record data: a machine learning approach

Ejemplares similares