Cargando…
Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302551/ http://dx.doi.org/10.1007/978-3-030-50426-7_1 |
_version_ | 1783547869445226496 |
---|---|
author | Zhang, Yu Lin, Zhuoyi Kwoh, Chee Keong |
author_facet | Zhang, Yu Lin, Zhuoyi Kwoh, Chee Keong |
author_sort | Zhang, Yu |
collection | PubMed |
description | Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations. |
format | Online Article Text |
id | pubmed-7302551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73025512020-06-19 Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy Zhang, Yu Lin, Zhuoyi Kwoh, Chee Keong Computational Science – ICCS 2020 Article Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations. 2020-05-25 /pmc/articles/PMC7302551/ http://dx.doi.org/10.1007/978-3-030-50426-7_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Zhang, Yu Lin, Zhuoyi Kwoh, Chee Keong Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title | Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title_full | Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title_fullStr | Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title_full_unstemmed | Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title_short | Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy |
title_sort | information theory-based feature selection: minimum distribution similarity with removed redundancy |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302551/ http://dx.doi.org/10.1007/978-3-030-50426-7_1 |
work_keys_str_mv | AT zhangyu informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy AT linzhuoyi informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy AT kwohcheekeong informationtheorybasedfeatureselectionminimumdistributionsimilaritywithremovedredundancy |