Cargando…

A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction

BACKGROUND: The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study i...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Zhiyong, Du, Dongping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7505424/
https://www.ncbi.nlm.nih.gov/pubmed/32956366
http://dx.doi.org/10.1371/journal.pone.0237724
_version_ 1783584808869298176
author Hu, Zhiyong
Du, Dongping
author_facet Hu, Zhiyong
Du, Dongping
author_sort Hu, Zhiyong
collection PubMed
description BACKGROUND: The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients. METHODS: We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors. RESULTS: The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers. CONCLUSIONS: The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs.
format Online
Article
Text
id pubmed-7505424
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75054242020-09-30 A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction Hu, Zhiyong Du, Dongping PLoS One Research Article BACKGROUND: The wide adoption of electronic health records (EHR) system has provided vast opportunities to advance health care services. However, the prevalence of missing values in EHR system poses a great challenge on data analysis to support clinical decision-making. The objective of this study is to develop a new methodological framework that can address the missing data challenge and provide a reliable tool to predict the hospital readmission among Heart Failure patients. METHODS: We used Gaussian Process Latent Variable Model (GPLVM) to impute the missing values. Specifically, a lower dimensional embedding was learned from a small complete dataset and then used to impute the missing values in the incomplete dataset. The GPLVM-based missing data imputation can provide both the mean estimate and the uncertainty associated with the mean estimate. To incorporate the uncertainty in prediction, a constrained support vector machine (cSVM) was developed to obtain robust predictions. We first sampled multiple datasets from the distributions of input uncertainty and trained a support vector machine for each dataset. Then an optimal classifier was identified by selecting the support vectors that maximize the separation margin of a newly sampled dataset and minimize the similarity with the pre-trained support vectors. RESULTS: The proposed model was derived and validated using Physionet MIMIC-III clinical database. The GPLVM imputation provided normalized mean absolute errors of 0.11 and 0.12 respectively when 20% and 30% of instances contained missing values, and the confidence bounds of the estimations captures 97% of the true values. The cSVM model provided an average Area Under Curve of 0.68, which improves the prediction accuracy by 7% as compared to some existing classifiers. CONCLUSIONS: The proposed method provides accurate imputation of missing values and has a better prediction performance as compared to existing models that can only deal with deterministic inputs. Public Library of Science 2020-09-21 /pmc/articles/PMC7505424/ /pubmed/32956366 http://dx.doi.org/10.1371/journal.pone.0237724 Text en © 2020 Hu, Du http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hu, Zhiyong
Du, Dongping
A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title_full A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title_fullStr A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title_full_unstemmed A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title_short A new analytical framework for missing data imputation and classification with uncertainty: Missing data imputation and heart failure readmission prediction
title_sort new analytical framework for missing data imputation and classification with uncertainty: missing data imputation and heart failure readmission prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7505424/
https://www.ncbi.nlm.nih.gov/pubmed/32956366
http://dx.doi.org/10.1371/journal.pone.0237724
work_keys_str_mv AT huzhiyong anewanalyticalframeworkformissingdataimputationandclassificationwithuncertaintymissingdataimputationandheartfailurereadmissionprediction
AT dudongping anewanalyticalframeworkformissingdataimputationandclassificationwithuncertaintymissingdataimputationandheartfailurereadmissionprediction
AT huzhiyong newanalyticalframeworkformissingdataimputationandclassificationwithuncertaintymissingdataimputationandheartfailurereadmissionprediction
AT dudongping newanalyticalframeworkformissingdataimputationandclassificationwithuncertaintymissingdataimputationandheartfailurereadmissionprediction