Cargando…

Feature Selection based on the Local Lift Dependence Scale

This paper uses a classical approach to feature selection: minimization of a cost function applied on estimated joint distributions. However, in this new formulation, the optimization search space is extended. The original search space is the Boolean lattice of features sets (BLFS), while the extend...

Descripción completa

Detalles Bibliográficos
Autores principales: Marcondes, Diego, Simonis, Adilson, Barrera, Junior
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512664/
https://www.ncbi.nlm.nih.gov/pubmed/33265188
http://dx.doi.org/10.3390/e20020097
_version_ 1783586210252324864
author Marcondes, Diego
Simonis, Adilson
Barrera, Junior
author_facet Marcondes, Diego
Simonis, Adilson
Barrera, Junior
author_sort Marcondes, Diego
collection PubMed
description This paper uses a classical approach to feature selection: minimization of a cost function applied on estimated joint distributions. However, in this new formulation, the optimization search space is extended. The original search space is the Boolean lattice of features sets (BLFS), while the extended one is a collection of Boolean lattices of ordered pairs (CBLOP), that is (features, associated value), indexed by the elements of the BLFS. In this approach, we may not only select the features that are most related to a variable Y, but also select the values of the features that most influence the variable or that are most prone to have a specific value of Y. A local formulation of Shannon’s mutual information, which generalizes Shannon’s original definition, is applied on a CBLOP to generate a multiple resolution scale for characterizing variable dependence, the Local Lift Dependence Scale (LLDS). The main contribution of this paper is to define and apply the LLDS to analyse local properties of joint distributions that are neglected by the classical Shannon’s global measure in order to select features. This approach is applied to select features based on the dependence between: i—the performance of students on university entrance exams and on courses of their first semester in the university; ii—the congress representative party and his vote on different matters; iii—the cover type of terrains and several terrain properties.
format Online
Article
Text
id pubmed-7512664
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75126642020-11-09 Feature Selection based on the Local Lift Dependence Scale Marcondes, Diego Simonis, Adilson Barrera, Junior Entropy (Basel) Article This paper uses a classical approach to feature selection: minimization of a cost function applied on estimated joint distributions. However, in this new formulation, the optimization search space is extended. The original search space is the Boolean lattice of features sets (BLFS), while the extended one is a collection of Boolean lattices of ordered pairs (CBLOP), that is (features, associated value), indexed by the elements of the BLFS. In this approach, we may not only select the features that are most related to a variable Y, but also select the values of the features that most influence the variable or that are most prone to have a specific value of Y. A local formulation of Shannon’s mutual information, which generalizes Shannon’s original definition, is applied on a CBLOP to generate a multiple resolution scale for characterizing variable dependence, the Local Lift Dependence Scale (LLDS). The main contribution of this paper is to define and apply the LLDS to analyse local properties of joint distributions that are neglected by the classical Shannon’s global measure in order to select features. This approach is applied to select features based on the dependence between: i—the performance of students on university entrance exams and on courses of their first semester in the university; ii—the congress representative party and his vote on different matters; iii—the cover type of terrains and several terrain properties. MDPI 2018-01-30 /pmc/articles/PMC7512664/ /pubmed/33265188 http://dx.doi.org/10.3390/e20020097 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Marcondes, Diego
Simonis, Adilson
Barrera, Junior
Feature Selection based on the Local Lift Dependence Scale
title Feature Selection based on the Local Lift Dependence Scale
title_full Feature Selection based on the Local Lift Dependence Scale
title_fullStr Feature Selection based on the Local Lift Dependence Scale
title_full_unstemmed Feature Selection based on the Local Lift Dependence Scale
title_short Feature Selection based on the Local Lift Dependence Scale
title_sort feature selection based on the local lift dependence scale
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512664/
https://www.ncbi.nlm.nih.gov/pubmed/33265188
http://dx.doi.org/10.3390/e20020097
work_keys_str_mv AT marcondesdiego featureselectionbasedonthelocalliftdependencescale
AT simonisadilson featureselectionbasedonthelocalliftdependencescale
AT barrerajunior featureselectionbasedonthelocalliftdependencescale