Cargando…

Stable Iterative Variable Selection

MOTIVATION: The emergence of datasets with tens of thousands of features, such as high-throughput omics biomedical data, highlights the importance of reducing the feature space into a distilled subset that can truly capture the signal for research and industry by aiding in finding more effective bio...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahmoudian, Mehrad, Venäläinen, Mikko S, Klén, Riku, Elo, Laura L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665768/
https://www.ncbi.nlm.nih.gov/pubmed/34270690
http://dx.doi.org/10.1093/bioinformatics/btab501
_version_ 1784614077501276160
author Mahmoudian, Mehrad
Venäläinen, Mikko S
Klén, Riku
Elo, Laura L
author_facet Mahmoudian, Mehrad
Venäläinen, Mikko S
Klén, Riku
Elo, Laura L
author_sort Mahmoudian, Mehrad
collection PubMed
description MOTIVATION: The emergence of datasets with tens of thousands of features, such as high-throughput omics biomedical data, highlights the importance of reducing the feature space into a distilled subset that can truly capture the signal for research and industry by aiding in finding more effective biomarkers for the question in hand. A good feature set also facilitates building robust predictive models with improved interpretability and convergence of the applied method due to the smaller feature space. RESULTS: Here, we present a robust feature selection method named Stable Iterative Variable Selection (SIVS) and assess its performance over both omics and clinical data types. As a performance assessment metric, we compared the number and goodness of the selected feature using SIVS to those selected by Least Absolute Shrinkage and Selection Operator regression. The results suggested that the feature space selected by SIVS was, on average, 41% smaller, without having a negative effect on the model performance. A similar result was observed for comparison with Boruta and caret RFE. AVAILABILITY AND IMPLEMENTATION: The method is implemented as an R package under GNU General Public License v3.0 and is accessible via Comprehensive R Archive Network (CRAN) via https://cran.r-project.org/package=sivs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8665768
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86657682021-12-13 Stable Iterative Variable Selection Mahmoudian, Mehrad Venäläinen, Mikko S Klén, Riku Elo, Laura L Bioinformatics Original Papers MOTIVATION: The emergence of datasets with tens of thousands of features, such as high-throughput omics biomedical data, highlights the importance of reducing the feature space into a distilled subset that can truly capture the signal for research and industry by aiding in finding more effective biomarkers for the question in hand. A good feature set also facilitates building robust predictive models with improved interpretability and convergence of the applied method due to the smaller feature space. RESULTS: Here, we present a robust feature selection method named Stable Iterative Variable Selection (SIVS) and assess its performance over both omics and clinical data types. As a performance assessment metric, we compared the number and goodness of the selected feature using SIVS to those selected by Least Absolute Shrinkage and Selection Operator regression. The results suggested that the feature space selected by SIVS was, on average, 41% smaller, without having a negative effect on the model performance. A similar result was observed for comparison with Boruta and caret RFE. AVAILABILITY AND IMPLEMENTATION: The method is implemented as an R package under GNU General Public License v3.0 and is accessible via Comprehensive R Archive Network (CRAN) via https://cran.r-project.org/package=sivs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-16 /pmc/articles/PMC8665768/ /pubmed/34270690 http://dx.doi.org/10.1093/bioinformatics/btab501 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Mahmoudian, Mehrad
Venäläinen, Mikko S
Klén, Riku
Elo, Laura L
Stable Iterative Variable Selection
title Stable Iterative Variable Selection
title_full Stable Iterative Variable Selection
title_fullStr Stable Iterative Variable Selection
title_full_unstemmed Stable Iterative Variable Selection
title_short Stable Iterative Variable Selection
title_sort stable iterative variable selection
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665768/
https://www.ncbi.nlm.nih.gov/pubmed/34270690
http://dx.doi.org/10.1093/bioinformatics/btab501
work_keys_str_mv AT mahmoudianmehrad stableiterativevariableselection
AT venalainenmikkos stableiterativevariableselection
AT klenriku stableiterativevariableselection
AT elolaural stableiterativevariableselection