Cargando…

Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure

INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Youjeong, Hurdle, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Published by Elsevier Inc. 2020
Materias:
008
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7527192/
http://dx.doi.org/10.1016/j.cardfail.2020.09.023
_version_ 1783589005164544000
author Kang, Youjeong
Hurdle, John
author_facet Kang, Youjeong
Hurdle, John
author_sort Kang, Youjeong
collection PubMed
description INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data mostly included simple vocabularies such as age, and ethnicity. Rarely do prior studies include clinical narrative data in a free-text format (i.e., unstructured data). No previous study has focused on using clinical narrative notes specifically for Medicare patients with HF in the acute-care setting. AIM: To identify clinical notes for building a predictive model for risk of 30-day rehospitalization among Medicate patients with HF. METHODS: This study first used free-text discharge summary notes and nursing care plans collected from June 1, 2015 to December 31, 2019, for a randomly selected 500 Medicare patients with HF. Natural Language Processing (NLP): we iterated over standard text pre-processing steps, exploring the impact of n-gram length, term document-frequency, word stemming, and the added value of parts-of-speech. We chose two models: 1) the classification model called Bag-of Words (BOW), where each document is represented by a vector based on the pre-processed text, and 2) Document Embedding, where document terms are mapped to a dimension-reducing layer (length equals 300). The latter runs exceptionally fast (on the order of tens-of-seconds for 2,000 documents). Machine Learning (ML): the output of the NLP BOW and Document Embedding models were fed to six different conventional machine learning systems (logistic regression, support vector machine, random forest, k-nearest neighbor clustering, a three-layer neural network, and Naïve Bayes). RESULTS: The mean age was 77±7.9, and the average of length of hospital stay was 4.9 days ± 4.8. The best BOW model we found using discharge summaries (n=387) produced an Area Under the Receiver Operating Characteristics Curve (AUC) of 0.71 and F(1) score of 0.65. The best Document Embedding model yielded an AUC of 0.65 and an F(1) score of 0.61. Using nursing care notes as the unit of analysis (n = 2,046), the NLM/ML performed far better. The best BOW model on care plans found an AUC of 0.85 and F(1) score of 0.77. The best Document Embedding produced an AUC of 0.83 and an F(1) score of 0.75. In all cases we held out 33% of the data set for validation, repeating a random draw 10 times and averaging the performance results. CONCLUSIONS: We conclude that nursing care plans are a better predictor of 30-day rehospitalization risk than discharge summaries. Because nursing care plans are shorter than discharge summaries, they have the added advantage of faster processing. Since the faster Document Embedding model's performance is comparable to that of BOW, we suggest its use in future work in the area of 30-day rehospitalization risk in Medicare patients with HF.
format Online
Article
Text
id pubmed-7527192
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Published by Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-75271922020-10-01 Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure Kang, Youjeong Hurdle, John J Card Fail 008 INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data mostly included simple vocabularies such as age, and ethnicity. Rarely do prior studies include clinical narrative data in a free-text format (i.e., unstructured data). No previous study has focused on using clinical narrative notes specifically for Medicare patients with HF in the acute-care setting. AIM: To identify clinical notes for building a predictive model for risk of 30-day rehospitalization among Medicate patients with HF. METHODS: This study first used free-text discharge summary notes and nursing care plans collected from June 1, 2015 to December 31, 2019, for a randomly selected 500 Medicare patients with HF. Natural Language Processing (NLP): we iterated over standard text pre-processing steps, exploring the impact of n-gram length, term document-frequency, word stemming, and the added value of parts-of-speech. We chose two models: 1) the classification model called Bag-of Words (BOW), where each document is represented by a vector based on the pre-processed text, and 2) Document Embedding, where document terms are mapped to a dimension-reducing layer (length equals 300). The latter runs exceptionally fast (on the order of tens-of-seconds for 2,000 documents). Machine Learning (ML): the output of the NLP BOW and Document Embedding models were fed to six different conventional machine learning systems (logistic regression, support vector machine, random forest, k-nearest neighbor clustering, a three-layer neural network, and Naïve Bayes). RESULTS: The mean age was 77±7.9, and the average of length of hospital stay was 4.9 days ± 4.8. The best BOW model we found using discharge summaries (n=387) produced an Area Under the Receiver Operating Characteristics Curve (AUC) of 0.71 and F(1) score of 0.65. The best Document Embedding model yielded an AUC of 0.65 and an F(1) score of 0.61. Using nursing care notes as the unit of analysis (n = 2,046), the NLM/ML performed far better. The best BOW model on care plans found an AUC of 0.85 and F(1) score of 0.77. The best Document Embedding produced an AUC of 0.83 and an F(1) score of 0.75. In all cases we held out 33% of the data set for validation, repeating a random draw 10 times and averaging the performance results. CONCLUSIONS: We conclude that nursing care plans are a better predictor of 30-day rehospitalization risk than discharge summaries. Because nursing care plans are shorter than discharge summaries, they have the added advantage of faster processing. Since the faster Document Embedding model's performance is comparable to that of BOW, we suggest its use in future work in the area of 30-day rehospitalization risk in Medicare patients with HF. Published by Elsevier Inc. 2020-10 2020-09-30 /pmc/articles/PMC7527192/ http://dx.doi.org/10.1016/j.cardfail.2020.09.023 Text en Copyright © 2020 Published by Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle 008
Kang, Youjeong
Hurdle, John
Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title_full Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title_fullStr Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title_full_unstemmed Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title_short Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
title_sort predictive model for risk of 30-day rehospitalization using a natural language processing/machine learning approach among medicare patients with heart failure
topic 008
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7527192/
http://dx.doi.org/10.1016/j.cardfail.2020.09.023
work_keys_str_mv AT kangyoujeong predictivemodelforriskof30dayrehospitalizationusinganaturallanguageprocessingmachinelearningapproachamongmedicarepatientswithheartfailure
AT hurdlejohn predictivemodelforriskof30dayrehospitalizationusinganaturallanguageprocessingmachinelearningapproachamongmedicarepatientswithheartfailure