Cargando…

A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records

Pristine and trustworthy data are required for efficient computer modelling for medical decision-making, yet data in medical care is frequently missing. As a result, missing values may occur not just in training data but also in testing data that might contain a single undiagnosed episode or a parti...

Descripción completa

Detalles Bibliográficos
Autores principales: Batra, Shivani, Khurana, Rohan, Khan, Mohammad Zubair, Boulila, Wadii, Koubaa, Anis, Srivastava, Prakash
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9030272/
https://www.ncbi.nlm.nih.gov/pubmed/35455196
http://dx.doi.org/10.3390/e24040533
_version_ 1784692097892220928
author Batra, Shivani
Khurana, Rohan
Khan, Mohammad Zubair
Boulila, Wadii
Koubaa, Anis
Srivastava, Prakash
author_facet Batra, Shivani
Khurana, Rohan
Khan, Mohammad Zubair
Boulila, Wadii
Koubaa, Anis
Srivastava, Prakash
author_sort Batra, Shivani
collection PubMed
description Pristine and trustworthy data are required for efficient computer modelling for medical decision-making, yet data in medical care is frequently missing. As a result, missing values may occur not just in training data but also in testing data that might contain a single undiagnosed episode or a participant. This study evaluates different imputation and regression procedures identified based on regressor performance and computational expense to fix the issues of missing values in both training and testing datasets. In the context of healthcare, several procedures are introduced for dealing with missing values. However, there is still a discussion concerning which imputation strategies are better in specific cases. This research proposes an ensemble imputation model that is educated to use a combination of simple mean imputation, k-nearest neighbour imputation, and iterative imputation methods, and then leverages them in a manner where the ideal imputation strategy is opted among them based on attribute correlations on missing value features. We introduce a unique Ensemble Strategy for Missing Value to analyse healthcare data with considerable missing values to identify unbiased and accurate prediction statistical modelling. The performance metrics have been generated using the eXtreme gradient boosting regressor, random forest regressor, and support vector regressor. The current study uses real-world healthcare data to conduct experiments and simulations of data with varying feature-wise missing frequencies indicating that the proposed technique surpasses standard missing value imputation approaches as well as the approach of dropping records holding missing values in terms of accuracy.
format Online
Article
Text
id pubmed-9030272
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90302722022-04-23 A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records Batra, Shivani Khurana, Rohan Khan, Mohammad Zubair Boulila, Wadii Koubaa, Anis Srivastava, Prakash Entropy (Basel) Article Pristine and trustworthy data are required for efficient computer modelling for medical decision-making, yet data in medical care is frequently missing. As a result, missing values may occur not just in training data but also in testing data that might contain a single undiagnosed episode or a participant. This study evaluates different imputation and regression procedures identified based on regressor performance and computational expense to fix the issues of missing values in both training and testing datasets. In the context of healthcare, several procedures are introduced for dealing with missing values. However, there is still a discussion concerning which imputation strategies are better in specific cases. This research proposes an ensemble imputation model that is educated to use a combination of simple mean imputation, k-nearest neighbour imputation, and iterative imputation methods, and then leverages them in a manner where the ideal imputation strategy is opted among them based on attribute correlations on missing value features. We introduce a unique Ensemble Strategy for Missing Value to analyse healthcare data with considerable missing values to identify unbiased and accurate prediction statistical modelling. The performance metrics have been generated using the eXtreme gradient boosting regressor, random forest regressor, and support vector regressor. The current study uses real-world healthcare data to conduct experiments and simulations of data with varying feature-wise missing frequencies indicating that the proposed technique surpasses standard missing value imputation approaches as well as the approach of dropping records holding missing values in terms of accuracy. MDPI 2022-04-10 /pmc/articles/PMC9030272/ /pubmed/35455196 http://dx.doi.org/10.3390/e24040533 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Batra, Shivani
Khurana, Rohan
Khan, Mohammad Zubair
Boulila, Wadii
Koubaa, Anis
Srivastava, Prakash
A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title_full A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title_fullStr A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title_full_unstemmed A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title_short A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
title_sort pragmatic ensemble strategy for missing values imputation in health records
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9030272/
https://www.ncbi.nlm.nih.gov/pubmed/35455196
http://dx.doi.org/10.3390/e24040533
work_keys_str_mv AT batrashivani apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT khuranarohan apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT khanmohammadzubair apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT boulilawadii apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT koubaaanis apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT srivastavaprakash apragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT batrashivani pragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT khuranarohan pragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT khanmohammadzubair pragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT boulilawadii pragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT koubaaanis pragmaticensemblestrategyformissingvaluesimputationinhealthrecords
AT srivastavaprakash pragmaticensemblestrategyformissingvaluesimputationinhealthrecords