Cargando…

Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the ob...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Min-Wei, Lin, Wei-Chao, Tsai, Chih-Fong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5823414/
https://www.ncbi.nlm.nih.gov/pubmed/29599943
http://dx.doi.org/10.1155/2018/1817479
_version_ 1783301875818299392
author Huang, Min-Wei
Lin, Wei-Chao
Tsai, Chih-Fong
author_facet Huang, Min-Wei
Lin, Wei-Chao
Tsai, Chih-Fong
author_sort Huang, Min-Wei
collection PubMed
description Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
format Online
Article
Text
id pubmed-5823414
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-58234142018-03-29 Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets Huang, Min-Wei Lin, Wei-Chao Tsai, Chih-Fong J Healthc Eng Research Article Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets. Hindawi 2018-02-04 /pmc/articles/PMC5823414/ /pubmed/29599943 http://dx.doi.org/10.1155/2018/1817479 Text en Copyright © 2018 Min-Wei Huang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Min-Wei
Lin, Wei-Chao
Tsai, Chih-Fong
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title_full Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title_fullStr Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title_full_unstemmed Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title_short Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets
title_sort outlier removal in model-based missing value imputation for medical datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5823414/
https://www.ncbi.nlm.nih.gov/pubmed/29599943
http://dx.doi.org/10.1155/2018/1817479
work_keys_str_mv AT huangminwei outlierremovalinmodelbasedmissingvalueimputationformedicaldatasets
AT linweichao outlierremovalinmodelbasedmissingvalueimputationformedicaldatasets
AT tsaichihfong outlierremovalinmodelbasedmissingvalueimputationformedicaldatasets