Cargando…

Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets

The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an un...

Descripción completa

Detalles Bibliográficos
Autores principales: Chaudhry, Muhammad Umar, Yasir, Muhammad, Asghar, Muhammad Nabeel, Lee, Jee-Hyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597188/
https://www.ncbi.nlm.nih.gov/pubmed/33286862
http://dx.doi.org/10.3390/e22101093
_version_ 1783602286392508416
author Chaudhry, Muhammad Umar
Yasir, Muhammad
Asghar, Muhammad Nabeel
Lee, Jee-Hyong
author_facet Chaudhry, Muhammad Umar
Yasir, Muhammad
Asghar, Muhammad Nabeel
Lee, Jee-Hyong
author_sort Chaudhry, Muhammad Umar
collection PubMed
description The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio.
format Online
Article
Text
id pubmed-7597188
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75971882020-11-09 Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets Chaudhry, Muhammad Umar Yasir, Muhammad Asghar, Muhammad Nabeel Lee, Jee-Hyong Entropy (Basel) Article The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio. MDPI 2020-09-29 /pmc/articles/PMC7597188/ /pubmed/33286862 http://dx.doi.org/10.3390/e22101093 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chaudhry, Muhammad Umar
Yasir, Muhammad
Asghar, Muhammad Nabeel
Lee, Jee-Hyong
Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title_full Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title_fullStr Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title_full_unstemmed Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title_short Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
title_sort monte carlo tree search-based recursive algorithm for feature selection in high-dimensional datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597188/
https://www.ncbi.nlm.nih.gov/pubmed/33286862
http://dx.doi.org/10.3390/e22101093
work_keys_str_mv AT chaudhrymuhammadumar montecarlotreesearchbasedrecursivealgorithmforfeatureselectioninhighdimensionaldatasets
AT yasirmuhammad montecarlotreesearchbasedrecursivealgorithmforfeatureselectioninhighdimensionaldatasets
AT asgharmuhammadnabeel montecarlotreesearchbasedrecursivealgorithmforfeatureselectioninhighdimensionaldatasets
AT leejeehyong montecarlotreesearchbasedrecursivealgorithmforfeatureselectioninhighdimensionaldatasets