Cargando…

Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies

Mutual Information (MI) based feature selection methods are popular due to their ability to capture the nonlinear relationship among variables. However, existing works rarely address the error (bias) that occurs due to the use of finite samples during the estimation of MI. To the best of our knowled...

Descripción completa

Detalles Bibliográficos
Autores principales: Roy, Puloma, Sharmin, Sadia, Ali, Amin Ahsan, Shoyaib, Mohammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206174/
http://dx.doi.org/10.1007/978-3-030-47426-3_64
_version_ 1783530362351124480
author Roy, Puloma
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
author_facet Roy, Puloma
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
author_sort Roy, Puloma
collection PubMed
description Mutual Information (MI) based feature selection methods are popular due to their ability to capture the nonlinear relationship among variables. However, existing works rarely address the error (bias) that occurs due to the use of finite samples during the estimation of MI. To the best of our knowledge, none of the existing methods address the bias issue for the high-order interaction term which is essential for better approximation of joint MI. In this paper, we first calculate the amount of bias of this term. Moreover, to select features using [Formula: see text] based search, we also show that this term follows [Formula: see text] distribution. Based on these two theoretical results, we propose Discretization and feature Selection based on bias corrected Mutual information (DSbM). DSbM is extended by adding simultaneous forward selection and backward elimination (DSbM[Formula: see text]). We demonstrate the superiority of DSbM over four state-of-the-art methods in terms of accuracy and the number of selected features on twenty benchmark datasets. Experimental results also demonstrate that DSbM outperforms the existing methods in terms of accuracy, Pareto Optimality and Friedman test. We also observe that compared to DSbM, in some dataset DSbM[Formula: see text] selects fewer features and increases accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this chapter (10.1007/978-3-030-47426-3_64) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7206174
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72061742020-05-08 Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies Roy, Puloma Sharmin, Sadia Ali, Amin Ahsan Shoyaib, Mohammad Advances in Knowledge Discovery and Data Mining Article Mutual Information (MI) based feature selection methods are popular due to their ability to capture the nonlinear relationship among variables. However, existing works rarely address the error (bias) that occurs due to the use of finite samples during the estimation of MI. To the best of our knowledge, none of the existing methods address the bias issue for the high-order interaction term which is essential for better approximation of joint MI. In this paper, we first calculate the amount of bias of this term. Moreover, to select features using [Formula: see text] based search, we also show that this term follows [Formula: see text] distribution. Based on these two theoretical results, we propose Discretization and feature Selection based on bias corrected Mutual information (DSbM). DSbM is extended by adding simultaneous forward selection and backward elimination (DSbM[Formula: see text]). We demonstrate the superiority of DSbM over four state-of-the-art methods in terms of accuracy and the number of selected features on twenty benchmark datasets. Experimental results also demonstrate that DSbM outperforms the existing methods in terms of accuracy, Pareto Optimality and Friedman test. We also observe that compared to DSbM, in some dataset DSbM[Formula: see text] selects fewer features and increases accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this chapter (10.1007/978-3-030-47426-3_64) contains supplementary material, which is available to authorized users. 2020-04-17 /pmc/articles/PMC7206174/ http://dx.doi.org/10.1007/978-3-030-47426-3_64 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Roy, Puloma
Sharmin, Sadia
Ali, Amin Ahsan
Shoyaib, Mohammad
Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title_full Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title_fullStr Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title_full_unstemmed Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title_short Discretization and Feature Selection Based on Bias Corrected Mutual Information Considering High-Order Dependencies
title_sort discretization and feature selection based on bias corrected mutual information considering high-order dependencies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206174/
http://dx.doi.org/10.1007/978-3-030-47426-3_64
work_keys_str_mv AT roypuloma discretizationandfeatureselectionbasedonbiascorrectedmutualinformationconsideringhighorderdependencies
AT sharminsadia discretizationandfeatureselectionbasedonbiascorrectedmutualinformationconsideringhighorderdependencies
AT aliaminahsan discretizationandfeatureselectionbasedonbiascorrectedmutualinformationconsideringhighorderdependencies
AT shoyaibmohammad discretizationandfeatureselectionbasedonbiascorrectedmutualinformationconsideringhighorderdependencies