Cargando…

A Sparse-Modeling Based Approach for Class Specific Feature Selection

In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e....

Descripción completa

Detalles Bibliográficos
Autores principales: Nardone, Davide, Ciaramella, Angelo, Staiano, Antonino
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924712/
https://www.ncbi.nlm.nih.gov/pubmed/33816890
http://dx.doi.org/10.7717/peerj-cs.237
_version_ 1783659147325079552
author Nardone, Davide
Ciaramella, Angelo
Staiano, Antonino
author_facet Nardone, Davide
Ciaramella, Angelo
Staiano, Antonino
author_sort Nardone, Davide
collection PubMed
description In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.
format Online
Article
Text
id pubmed-7924712
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79247122021-04-02 A Sparse-Modeling Based Approach for Class Specific Feature Selection Nardone, Davide Ciaramella, Angelo Staiano, Antonino PeerJ Comput Sci Bioinformatics In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes. PeerJ Inc. 2019-11-18 /pmc/articles/PMC7924712/ /pubmed/33816890 http://dx.doi.org/10.7717/peerj-cs.237 Text en ©2019 Nardone et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Nardone, Davide
Ciaramella, Angelo
Staiano, Antonino
A Sparse-Modeling Based Approach for Class Specific Feature Selection
title A Sparse-Modeling Based Approach for Class Specific Feature Selection
title_full A Sparse-Modeling Based Approach for Class Specific Feature Selection
title_fullStr A Sparse-Modeling Based Approach for Class Specific Feature Selection
title_full_unstemmed A Sparse-Modeling Based Approach for Class Specific Feature Selection
title_short A Sparse-Modeling Based Approach for Class Specific Feature Selection
title_sort sparse-modeling based approach for class specific feature selection
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924712/
https://www.ncbi.nlm.nih.gov/pubmed/33816890
http://dx.doi.org/10.7717/peerj-cs.237
work_keys_str_mv AT nardonedavide asparsemodelingbasedapproachforclassspecificfeatureselection
AT ciaramellaangelo asparsemodelingbasedapproachforclassspecificfeatureselection
AT staianoantonino asparsemodelingbasedapproachforclassspecificfeatureselection
AT nardonedavide sparsemodelingbasedapproachforclassspecificfeatureselection
AT ciaramellaangelo sparsemodelingbasedapproachforclassspecificfeatureselection
AT staianoantonino sparsemodelingbasedapproachforclassspecificfeatureselection