Cargando…

Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy

Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previo...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yu, Lin, Zhuoyi, Kwoh, Chee Keong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302551/
http://dx.doi.org/10.1007/978-3-030-50426-7_1
_version_ 1783547869445226496
author Zhang, Yu
Lin, Zhuoyi
Kwoh, Chee Keong
author_facet Zhang, Yu
Lin, Zhuoyi
Kwoh, Chee Keong
author_sort Zhang, Yu
collection PubMed
description Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations.
format Online
Article
Text
id pubmed-7302551
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73025512020-06-19 Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy Zhang, Yu Lin, Zhuoyi Kwoh, Chee Keong Computational Science – ICCS 2020 Article Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations. 2020-05-25 /pmc/articles/PMC7302551/ http://dx.doi.org/10.1007/978-3-030-50426-7_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Zhang, Yu
Lin, Zhuoyi
Kwoh, Chee Keong
Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title_full Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title_fullStr Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title_full_unstemmed Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title_short Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
title_sort information theory-based feature selection: minimum distribution similarity with removed redundancy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302551/
http://dx.doi.org/10.1007/978-3-030-50426-7_1
work_keys_str_mv AT zhangyu informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy
AT linzhuoyi informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy
AT kwohcheekeong informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy