Cargando…

An Efficient and Effective Model to Handle Missing Data in Classification

Missing data is one of the most important causes in reduction of classification accuracy. Many real datasets suffer from missing values, especially in medical sciences. Imputation is a common way to deal with incomplete datasets. There are various imputation methods that can be applied, and the choi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mehrabani-Zeinabad, Kamran, Doostfatemeh, Marziyeh, Ayatollahi, Seyyed Mohammad Taghi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710403/ https://www.ncbi.nlm.nih.gov/pubmed/33299878 http://dx.doi.org/10.1155/2020/8810143

_version_	1783617938300862464
author	Mehrabani-Zeinabad, Kamran Doostfatemeh, Marziyeh Ayatollahi, Seyyed Mohammad Taghi
author_facet	Mehrabani-Zeinabad, Kamran Doostfatemeh, Marziyeh Ayatollahi, Seyyed Mohammad Taghi
author_sort	Mehrabani-Zeinabad, Kamran
collection	PubMed
description	Missing data is one of the most important causes in reduction of classification accuracy. Many real datasets suffer from missing values, especially in medical sciences. Imputation is a common way to deal with incomplete datasets. There are various imputation methods that can be applied, and the choice of the best method depends on the dataset conditions such as sample size, missing percent, and missing mechanism. Therefore, the better solution is to classify incomplete datasets without imputation and without any loss of information. The structure of the “Bayesian additive regression trees” (BART) model is improved with the “Missingness Incorporated in Attributes” approach to solve its inefficiency in handling the missingness problem. Implementation of MIA-within-BART is named “BART.m”. As the abilities of BART.m are not investigated in classification of incomplete datasets, this simulation-based study aimed to provide such resource. The results indicate that BART.m can be used even for datasets with 90 missing present and more importantly, it diagnoses the irrelevant variables and removes them by its own. BART.m outperforms common models for classification with incomplete data, according to accuracy and computational time. Based on the revealed properties, it can be said that BART.m is a high accuracy model in classification of incomplete datasets which avoids any assumptions and preprocess steps.
format	Online Article Text
id	pubmed-7710403
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-77104032020-12-08 An Efficient and Effective Model to Handle Missing Data in Classification Mehrabani-Zeinabad, Kamran Doostfatemeh, Marziyeh Ayatollahi, Seyyed Mohammad Taghi Biomed Res Int Research Article Missing data is one of the most important causes in reduction of classification accuracy. Many real datasets suffer from missing values, especially in medical sciences. Imputation is a common way to deal with incomplete datasets. There are various imputation methods that can be applied, and the choice of the best method depends on the dataset conditions such as sample size, missing percent, and missing mechanism. Therefore, the better solution is to classify incomplete datasets without imputation and without any loss of information. The structure of the “Bayesian additive regression trees” (BART) model is improved with the “Missingness Incorporated in Attributes” approach to solve its inefficiency in handling the missingness problem. Implementation of MIA-within-BART is named “BART.m”. As the abilities of BART.m are not investigated in classification of incomplete datasets, this simulation-based study aimed to provide such resource. The results indicate that BART.m can be used even for datasets with 90 missing present and more importantly, it diagnoses the irrelevant variables and removes them by its own. BART.m outperforms common models for classification with incomplete data, according to accuracy and computational time. Based on the revealed properties, it can be said that BART.m is a high accuracy model in classification of incomplete datasets which avoids any assumptions and preprocess steps. Hindawi 2020-11-25 /pmc/articles/PMC7710403/ /pubmed/33299878 http://dx.doi.org/10.1155/2020/8810143 Text en Copyright © 2020 Kamran Mehrabani-Zeinabad et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Mehrabani-Zeinabad, Kamran Doostfatemeh, Marziyeh Ayatollahi, Seyyed Mohammad Taghi An Efficient and Effective Model to Handle Missing Data in Classification
title	An Efficient and Effective Model to Handle Missing Data in Classification
title_full	An Efficient and Effective Model to Handle Missing Data in Classification
title_fullStr	An Efficient and Effective Model to Handle Missing Data in Classification
title_full_unstemmed	An Efficient and Effective Model to Handle Missing Data in Classification
title_short	An Efficient and Effective Model to Handle Missing Data in Classification
title_sort	efficient and effective model to handle missing data in classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710403/ https://www.ncbi.nlm.nih.gov/pubmed/33299878 http://dx.doi.org/10.1155/2020/8810143
work_keys_str_mv	AT mehrabanizeinabadkamran anefficientandeffectivemodeltohandlemissingdatainclassification AT doostfatemehmarziyeh anefficientandeffectivemodeltohandlemissingdatainclassification AT ayatollahiseyyedmohammadtaghi anefficientandeffectivemodeltohandlemissingdatainclassification AT mehrabanizeinabadkamran efficientandeffectivemodeltohandlemissingdatainclassification AT doostfatemehmarziyeh efficientandeffectivemodeltohandlemissingdatainclassification AT ayatollahiseyyedmohammadtaghi efficientandeffectivemodeltohandlemissingdatainclassification

An Efficient and Effective Model to Handle Missing Data in Classification

Ejemplares similares