Cargando…

Extending greedy feature selection algorithms to multiple solutions

Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under...

Descripción completa

Detalles Bibliográficos
Autores principales: Borboudakis, Giorgos, Tsamardinos, Ioannis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550441/
https://www.ncbi.nlm.nih.gov/pubmed/34720675
http://dx.doi.org/10.1007/s10618-020-00731-7
_version_ 1784590961138991104
author Borboudakis, Giorgos
Tsamardinos, Ioannis
author_facet Borboudakis, Giorgos
Tsamardinos, Ioannis
author_sort Borboudakis, Giorgos
collection PubMed
description Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.
format Online
Article
Text
id pubmed-8550441
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-85504412021-10-29 Extending greedy feature selection algorithms to multiple solutions Borboudakis, Giorgos Tsamardinos, Ioannis Data Min Knowl Discov Article Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance. Springer US 2021-05-01 2021 /pmc/articles/PMC8550441/ /pubmed/34720675 http://dx.doi.org/10.1007/s10618-020-00731-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Borboudakis, Giorgos
Tsamardinos, Ioannis
Extending greedy feature selection algorithms to multiple solutions
title Extending greedy feature selection algorithms to multiple solutions
title_full Extending greedy feature selection algorithms to multiple solutions
title_fullStr Extending greedy feature selection algorithms to multiple solutions
title_full_unstemmed Extending greedy feature selection algorithms to multiple solutions
title_short Extending greedy feature selection algorithms to multiple solutions
title_sort extending greedy feature selection algorithms to multiple solutions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550441/
https://www.ncbi.nlm.nih.gov/pubmed/34720675
http://dx.doi.org/10.1007/s10618-020-00731-7
work_keys_str_mv AT borboudakisgiorgos extendinggreedyfeatureselectionalgorithmstomultiplesolutions
AT tsamardinosioannis extendinggreedyfeatureselectionalgorithmstomultiplesolutions