Cargando…
Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure
INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Published by Elsevier Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7527192/ http://dx.doi.org/10.1016/j.cardfail.2020.09.023 |
_version_ | 1783589005164544000 |
---|---|
author | Kang, Youjeong Hurdle, John |
author_facet | Kang, Youjeong Hurdle, John |
author_sort | Kang, Youjeong |
collection | PubMed |
description | INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data mostly included simple vocabularies such as age, and ethnicity. Rarely do prior studies include clinical narrative data in a free-text format (i.e., unstructured data). No previous study has focused on using clinical narrative notes specifically for Medicare patients with HF in the acute-care setting. AIM: To identify clinical notes for building a predictive model for risk of 30-day rehospitalization among Medicate patients with HF. METHODS: This study first used free-text discharge summary notes and nursing care plans collected from June 1, 2015 to December 31, 2019, for a randomly selected 500 Medicare patients with HF. Natural Language Processing (NLP): we iterated over standard text pre-processing steps, exploring the impact of n-gram length, term document-frequency, word stemming, and the added value of parts-of-speech. We chose two models: 1) the classification model called Bag-of Words (BOW), where each document is represented by a vector based on the pre-processed text, and 2) Document Embedding, where document terms are mapped to a dimension-reducing layer (length equals 300). The latter runs exceptionally fast (on the order of tens-of-seconds for 2,000 documents). Machine Learning (ML): the output of the NLP BOW and Document Embedding models were fed to six different conventional machine learning systems (logistic regression, support vector machine, random forest, k-nearest neighbor clustering, a three-layer neural network, and Naïve Bayes). RESULTS: The mean age was 77±7.9, and the average of length of hospital stay was 4.9 days ± 4.8. The best BOW model we found using discharge summaries (n=387) produced an Area Under the Receiver Operating Characteristics Curve (AUC) of 0.71 and F(1) score of 0.65. The best Document Embedding model yielded an AUC of 0.65 and an F(1) score of 0.61. Using nursing care notes as the unit of analysis (n = 2,046), the NLM/ML performed far better. The best BOW model on care plans found an AUC of 0.85 and F(1) score of 0.77. The best Document Embedding produced an AUC of 0.83 and an F(1) score of 0.75. In all cases we held out 33% of the data set for validation, repeating a random draw 10 times and averaging the performance results. CONCLUSIONS: We conclude that nursing care plans are a better predictor of 30-day rehospitalization risk than discharge summaries. Because nursing care plans are shorter than discharge summaries, they have the added advantage of faster processing. Since the faster Document Embedding model's performance is comparable to that of BOW, we suggest its use in future work in the area of 30-day rehospitalization risk in Medicare patients with HF. |
format | Online Article Text |
id | pubmed-7527192 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Published by Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75271922020-10-01 Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure Kang, Youjeong Hurdle, John J Card Fail 008 INTRODUCTION: Nearly 80% of all patients with heart failure (HF) are older adults (≥65 years of age). Prior studies have built predictive models that relied on structured data from electronic health records (EHRs) to predict the risk of 30-day rehospitalization for patients with HF. Structured data mostly included simple vocabularies such as age, and ethnicity. Rarely do prior studies include clinical narrative data in a free-text format (i.e., unstructured data). No previous study has focused on using clinical narrative notes specifically for Medicare patients with HF in the acute-care setting. AIM: To identify clinical notes for building a predictive model for risk of 30-day rehospitalization among Medicate patients with HF. METHODS: This study first used free-text discharge summary notes and nursing care plans collected from June 1, 2015 to December 31, 2019, for a randomly selected 500 Medicare patients with HF. Natural Language Processing (NLP): we iterated over standard text pre-processing steps, exploring the impact of n-gram length, term document-frequency, word stemming, and the added value of parts-of-speech. We chose two models: 1) the classification model called Bag-of Words (BOW), where each document is represented by a vector based on the pre-processed text, and 2) Document Embedding, where document terms are mapped to a dimension-reducing layer (length equals 300). The latter runs exceptionally fast (on the order of tens-of-seconds for 2,000 documents). Machine Learning (ML): the output of the NLP BOW and Document Embedding models were fed to six different conventional machine learning systems (logistic regression, support vector machine, random forest, k-nearest neighbor clustering, a three-layer neural network, and Naïve Bayes). RESULTS: The mean age was 77±7.9, and the average of length of hospital stay was 4.9 days ± 4.8. The best BOW model we found using discharge summaries (n=387) produced an Area Under the Receiver Operating Characteristics Curve (AUC) of 0.71 and F(1) score of 0.65. The best Document Embedding model yielded an AUC of 0.65 and an F(1) score of 0.61. Using nursing care notes as the unit of analysis (n = 2,046), the NLM/ML performed far better. The best BOW model on care plans found an AUC of 0.85 and F(1) score of 0.77. The best Document Embedding produced an AUC of 0.83 and an F(1) score of 0.75. In all cases we held out 33% of the data set for validation, repeating a random draw 10 times and averaging the performance results. CONCLUSIONS: We conclude that nursing care plans are a better predictor of 30-day rehospitalization risk than discharge summaries. Because nursing care plans are shorter than discharge summaries, they have the added advantage of faster processing. Since the faster Document Embedding model's performance is comparable to that of BOW, we suggest its use in future work in the area of 30-day rehospitalization risk in Medicare patients with HF. Published by Elsevier Inc. 2020-10 2020-09-30 /pmc/articles/PMC7527192/ http://dx.doi.org/10.1016/j.cardfail.2020.09.023 Text en Copyright © 2020 Published by Elsevier Inc. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | 008 Kang, Youjeong Hurdle, John Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title | Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title_full | Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title_fullStr | Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title_full_unstemmed | Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title_short | Predictive Model for Risk of 30-Day Rehospitalization Using a Natural Language Processing/Machine Learning Approach Among Medicare Patients with Heart Failure |
title_sort | predictive model for risk of 30-day rehospitalization using a natural language processing/machine learning approach among medicare patients with heart failure |
topic | 008 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7527192/ http://dx.doi.org/10.1016/j.cardfail.2020.09.023 |
work_keys_str_mv | AT kangyoujeong predictivemodelforriskof30dayrehospitalizationusinganaturallanguageprocessingmachinelearningapproachamongmedicarepatientswithheartfailure AT hurdlejohn predictivemodelforriskof30dayrehospitalizationusinganaturallanguageprocessingmachinelearningapproachamongmedicarepatientswithheartfailure |