Cargando…

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fa...

Descripción completa

Detalles Bibliográficos
Autores principales: Estrela, Gustavo, Gubitoso, Marco Dimas, Ferreira, Carlos Eduardo, Barrera, Junior, Reis, Marcelo S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516975/
https://www.ncbi.nlm.nih.gov/pubmed/33286261
http://dx.doi.org/10.3390/e22040492
_version_ 1783587122108694528
author Estrela, Gustavo
Gubitoso, Marco Dimas
Ferreira, Carlos Eduardo
Barrera, Junior
Reis, Marcelo S.
author_facet Estrela, Gustavo
Gubitoso, Marco Dimas
Ferreira, Carlos Eduardo
Barrera, Junior
Reis, Marcelo S.
author_sort Estrela, Gustavo
collection PubMed
description In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.
format Online
Article
Text
id pubmed-7516975
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75169752020-11-09 An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection Estrela, Gustavo Gubitoso, Marco Dimas Ferreira, Carlos Eduardo Barrera, Junior Reis, Marcelo S. Entropy (Basel) Article In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection. MDPI 2020-04-24 /pmc/articles/PMC7516975/ /pubmed/33286261 http://dx.doi.org/10.3390/e22040492 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Estrela, Gustavo
Gubitoso, Marco Dimas
Ferreira, Carlos Eduardo
Barrera, Junior
Reis, Marcelo S.
An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title_full An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title_fullStr An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title_full_unstemmed An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title_short An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
title_sort efficient, parallelized algorithm for optimal conditional entropy-based feature selection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516975/
https://www.ncbi.nlm.nih.gov/pubmed/33286261
http://dx.doi.org/10.3390/e22040492
work_keys_str_mv AT estrelagustavo anefficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT gubitosomarcodimas anefficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT ferreiracarloseduardo anefficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT barrerajunior anefficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT reismarcelos anefficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT estrelagustavo efficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT gubitosomarcodimas efficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT ferreiracarloseduardo efficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT barrerajunior efficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection
AT reismarcelos efficientparallelizedalgorithmforoptimalconditionalentropybasedfeatureselection