Cargando…

Utility metric for unsupervised feature selection

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required fo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Villa, Amalia, Mundanad Narayanan, Abhijith, Van Huffel, Sabine, Bertrand, Alexander, Varon, Carolina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Algorithms and Analysis of Algorithms
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080425/ https://www.ncbi.nlm.nih.gov/pubmed/33981839 http://dx.doi.org/10.7717/peerj-cs.477

_version_	1783685423171633152
author	Villa, Amalia Mundanad Narayanan, Abhijith Van Huffel, Sabine Bertrand, Alexander Varon, Carolina
author_facet	Villa, Amalia Mundanad Narayanan, Abhijith Van Huffel, Sabine Bertrand, Alexander Varon, Carolina
author_sort	Villa, Amalia
collection	PubMed
description	Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.
format	Online Article Text
id	pubmed-8080425
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-80804252021-05-11 Utility metric for unsupervised feature selection Villa, Amalia Mundanad Narayanan, Abhijith Van Huffel, Sabine Bertrand, Alexander Varon, Carolina PeerJ Comput Sci Algorithms and Analysis of Algorithms Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters. PeerJ Inc. 2021-04-21 /pmc/articles/PMC8080425/ /pubmed/33981839 http://dx.doi.org/10.7717/peerj-cs.477 Text en © 2021 Villa et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Algorithms and Analysis of Algorithms Villa, Amalia Mundanad Narayanan, Abhijith Van Huffel, Sabine Bertrand, Alexander Varon, Carolina Utility metric for unsupervised feature selection
title	Utility metric for unsupervised feature selection
title_full	Utility metric for unsupervised feature selection
title_fullStr	Utility metric for unsupervised feature selection
title_full_unstemmed	Utility metric for unsupervised feature selection
title_short	Utility metric for unsupervised feature selection
title_sort	utility metric for unsupervised feature selection
topic	Algorithms and Analysis of Algorithms
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080425/ https://www.ncbi.nlm.nih.gov/pubmed/33981839 http://dx.doi.org/10.7717/peerj-cs.477
work_keys_str_mv	AT villaamalia utilitymetricforunsupervisedfeatureselection AT mundanadnarayananabhijith utilitymetricforunsupervisedfeatureselection AT vanhuffelsabine utilitymetricforunsupervisedfeatureselection AT bertrandalexander utilitymetricforunsupervisedfeatureselection AT varoncarolina utilitymetricforunsupervisedfeatureselection

Utility metric for unsupervised feature selection

Ejemplares similares