Cargando…

Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms

BACKGROUND: With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for t...

Descripción completa

Detalles Bibliográficos
Autores principales: Jagdhuber, Rudolf, Lang, Michel, Stenzl, Arnulf, Neuhaus, Jochen, Rahnenführer, Jörg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986087/
https://www.ncbi.nlm.nih.gov/pubmed/31992203
http://dx.doi.org/10.1186/s12859-020-3361-9
_version_ 1783491913010118656
author Jagdhuber, Rudolf
Lang, Michel
Stenzl, Arnulf
Neuhaus, Jochen
Rahnenführer, Jörg
author_facet Jagdhuber, Rudolf
Lang, Michel
Stenzl, Arnulf
Neuhaus, Jochen
Rahnenführer, Jörg
author_sort Jagdhuber, Rudolf
collection PubMed
description BACKGROUND: With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets often include many input features not related to the diagnostic or therapeutic target variable. A less researched, but also relevant aspect for medical applications are costs of different biomarker candidates. These costs are often financial costs, but can also refer to other aspects, for example the decision between a painful biopsy marker and a simple urine test. In this paper, we propose extensions to two feature selection methods to control the total amount of such costs: greedy forward selection and genetic algorithms. In comprehensive simulation studies of binary classification tasks, we compare the predictive performance, the run-time and the detection rate of relevant features for the new proposed methods and five baseline alternatives to handle budget constraints. RESULTS: In simulations with a predefined budget constraint, our proposed methods outperform the baseline alternatives, with just minor differences between them. Only in the scenario without an actual budget constraint, our adapted greedy forward selection approach showed a clear drop in performance compared to the other methods. However, introducing a hyperparameter to adapt the benefit-cost trade-off in this method could overcome this weakness. CONCLUSIONS: In feature cost scenarios, where a total budget has to be met, common feature selection algorithms are often not suitable to identify well performing subsets for a modelling task. Adaptations of these algorithms such as the ones proposed in this paper can help to tackle this problem.
format Online
Article
Text
id pubmed-6986087
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69860872020-01-30 Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms Jagdhuber, Rudolf Lang, Michel Stenzl, Arnulf Neuhaus, Jochen Rahnenführer, Jörg BMC Bioinformatics Methodology Article BACKGROUND: With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets often include many input features not related to the diagnostic or therapeutic target variable. A less researched, but also relevant aspect for medical applications are costs of different biomarker candidates. These costs are often financial costs, but can also refer to other aspects, for example the decision between a painful biopsy marker and a simple urine test. In this paper, we propose extensions to two feature selection methods to control the total amount of such costs: greedy forward selection and genetic algorithms. In comprehensive simulation studies of binary classification tasks, we compare the predictive performance, the run-time and the detection rate of relevant features for the new proposed methods and five baseline alternatives to handle budget constraints. RESULTS: In simulations with a predefined budget constraint, our proposed methods outperform the baseline alternatives, with just minor differences between them. Only in the scenario without an actual budget constraint, our adapted greedy forward selection approach showed a clear drop in performance compared to the other methods. However, introducing a hyperparameter to adapt the benefit-cost trade-off in this method could overcome this weakness. CONCLUSIONS: In feature cost scenarios, where a total budget has to be met, common feature selection algorithms are often not suitable to identify well performing subsets for a modelling task. Adaptations of these algorithms such as the ones proposed in this paper can help to tackle this problem. BioMed Central 2020-01-28 /pmc/articles/PMC6986087/ /pubmed/31992203 http://dx.doi.org/10.1186/s12859-020-3361-9 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Jagdhuber, Rudolf
Lang, Michel
Stenzl, Arnulf
Neuhaus, Jochen
Rahnenführer, Jörg
Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title_full Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title_fullStr Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title_full_unstemmed Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title_short Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
title_sort cost-constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986087/
https://www.ncbi.nlm.nih.gov/pubmed/31992203
http://dx.doi.org/10.1186/s12859-020-3361-9
work_keys_str_mv AT jagdhuberrudolf costconstrainedfeatureselectioninbinaryclassificationadaptationsforgreedyforwardselectionandgeneticalgorithms
AT langmichel costconstrainedfeatureselectioninbinaryclassificationadaptationsforgreedyforwardselectionandgeneticalgorithms
AT stenzlarnulf costconstrainedfeatureselectioninbinaryclassificationadaptationsforgreedyforwardselectionandgeneticalgorithms
AT neuhausjochen costconstrainedfeatureselectioninbinaryclassificationadaptationsforgreedyforwardselectionandgeneticalgorithms
AT rahnenfuhrerjorg costconstrainedfeatureselectioninbinaryclassificationadaptationsforgreedyforwardselectionandgeneticalgorithms