Cargando…

A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study

BACKGROUND: The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. METHODS: Our analytical approach involves three steps:...

Descripción completa

Detalles Bibliográficos
Autores principales: AbdelRahman, Samir E, Zhang, Mingyuan, Bray, Bruce E, Kawamoto, Kensaku
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4074427/
https://www.ncbi.nlm.nih.gov/pubmed/24886637
http://dx.doi.org/10.1186/1472-6947-14-41
_version_ 1782323216591618048
author AbdelRahman, Samir E
Zhang, Mingyuan
Bray, Bruce E
Kawamoto, Kensaku
author_facet AbdelRahman, Samir E
Zhang, Mingyuan
Bray, Bruce E
Kawamoto, Kensaku
author_sort AbdelRahman, Samir E
collection PubMed
description BACKGROUND: The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. METHODS: Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. RESULTS: The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. CONCLUSION: The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.
format Online
Article
Text
id pubmed-4074427
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40744272014-06-29 A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study AbdelRahman, Samir E Zhang, Mingyuan Bray, Bruce E Kawamoto, Kensaku BMC Med Inform Decis Mak Research Article BACKGROUND: The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. METHODS: Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. RESULTS: The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. CONCLUSION: The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine. BioMed Central 2014-05-27 /pmc/articles/PMC4074427/ /pubmed/24886637 http://dx.doi.org/10.1186/1472-6947-14-41 Text en Copyright © 2014 AbdelRahman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
AbdelRahman, Samir E
Zhang, Mingyuan
Bray, Bruce E
Kawamoto, Kensaku
A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title_full A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title_fullStr A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title_full_unstemmed A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title_short A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
title_sort three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4074427/
https://www.ncbi.nlm.nih.gov/pubmed/24886637
http://dx.doi.org/10.1186/1472-6947-14-41
work_keys_str_mv AT abdelrahmansamire athreestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT zhangmingyuan athreestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT braybrucee athreestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT kawamotokensaku athreestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT abdelrahmansamire threestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT zhangmingyuan threestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT braybrucee threestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy
AT kawamotokensaku threestepapproachforthederivationandvalidationofhighperformingpredictivemodelsusinganoperationaldatasetcongestiveheartfailurereadmissioncasestudy