Cargando…

Probing for Sparse and Fast Variable Selection with Model-Based Boosting

We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slig...

Descripción completa

Detalles Bibliográficos
Autores principales: Thomas, Janek, Hepp, Tobias, Mayr, Andreas, Bischl, Bernd
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555005/
https://www.ncbi.nlm.nih.gov/pubmed/28831289
http://dx.doi.org/10.1155/2017/1421409
_version_ 1783256870361759744
author Thomas, Janek
Hepp, Tobias
Mayr, Andreas
Bischl, Bernd
author_facet Thomas, Janek
Hepp, Tobias
Mayr, Andreas
Bischl, Bernd
author_sort Thomas, Janek
collection PubMed
description We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so-called shadow variables, and stop the stepwise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on three gene expression data sets.
format Online
Article
Text
id pubmed-5555005
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-55550052017-08-22 Probing for Sparse and Fast Variable Selection with Model-Based Boosting Thomas, Janek Hepp, Tobias Mayr, Andreas Bischl, Bernd Comput Math Methods Med Research Article We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so-called shadow variables, and stop the stepwise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on three gene expression data sets. Hindawi 2017 2017-07-31 /pmc/articles/PMC5555005/ /pubmed/28831289 http://dx.doi.org/10.1155/2017/1421409 Text en Copyright © 2017 Janek Thomas et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Thomas, Janek
Hepp, Tobias
Mayr, Andreas
Bischl, Bernd
Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title_full Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title_fullStr Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title_full_unstemmed Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title_short Probing for Sparse and Fast Variable Selection with Model-Based Boosting
title_sort probing for sparse and fast variable selection with model-based boosting
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555005/
https://www.ncbi.nlm.nih.gov/pubmed/28831289
http://dx.doi.org/10.1155/2017/1421409
work_keys_str_mv AT thomasjanek probingforsparseandfastvariableselectionwithmodelbasedboosting
AT hepptobias probingforsparseandfastvariableselectionwithmodelbasedboosting
AT mayrandreas probingforsparseandfastvariableselectionwithmodelbasedboosting
AT bischlbernd probingforsparseandfastvariableselectionwithmodelbasedboosting