Cargando…

ExhauFS: exhaustive search-based feature selection for classification and survival regression

Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is an exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This metho...

Descripción completa

Detalles Bibliográficos
Autores principales: Nersisyan, Stepan, Novosad, Victor, Galatenko, Alexei, Sokolov, Andrey, Bokov, Grigoriy, Konovalov, Alexander, Alekseev, Dmitry, Tonevitsky, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8976470/
https://www.ncbi.nlm.nih.gov/pubmed/35378930
http://dx.doi.org/10.7717/peerj.13200
_version_ 1784680576251330560
author Nersisyan, Stepan
Novosad, Victor
Galatenko, Alexei
Sokolov, Andrey
Bokov, Grigoriy
Konovalov, Alexander
Alekseev, Dmitry
Tonevitsky, Alexander
author_facet Nersisyan, Stepan
Novosad, Victor
Galatenko, Alexei
Sokolov, Andrey
Bokov, Grigoriy
Konovalov, Alexander
Alekseev, Dmitry
Tonevitsky, Alexander
author_sort Nersisyan, Stepan
collection PubMed
description Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is an exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This method together with its optimizations were actively used in biomedical research, however, publicly available implementation is missing. We present ExhauFS—the user-friendly command-line implementation of the exhaustive search approach for classification and survival regression. Aside from tool description, we included three application examples in the manuscript to comprehensively review the implemented functionality. First, we executed ExhauFS on a toy cervical cancer dataset to illustrate basic concepts. Then, multi-cohort microarray breast cancer datasets were used to construct gene signatures for 5-year recurrence classification. The vast majority of signatures constructed by ExhauFS passed 0.65 threshold of sensitivity and specificity on all datasets, including the validation one. Moreover, a number of gene signatures demonstrated reliable performance on independent RNA-seq dataset without any coefficient re-tuning, i.e., turned out to be cross-platform. Finally, Cox survival regression models were used to fit isomiR signatures for overall survival prediction for patients with colorectal cancer. Similarly to the previous example, the major part of models passed the pre-defined concordance index threshold 0.65 on all datasets. In both real-world scenarios (breast and colorectal cancer datasets), ExhauFS was benchmarked against state-of-the-art feature selection models, including L(1)-regularized sparse models. In case of breast cancer, we were unable to construct reliable cross-platform classifiers using alternative feature selection approaches. In case of colorectal cancer not a single model passed the same 0.65 threshold. Source codes and documentation of ExhauFS are available on GitHub: https://github.com/s-a-nersisyan/ExhauFS.
format Online
Article
Text
id pubmed-8976470
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-89764702022-04-03 ExhauFS: exhaustive search-based feature selection for classification and survival regression Nersisyan, Stepan Novosad, Victor Galatenko, Alexei Sokolov, Andrey Bokov, Grigoriy Konovalov, Alexander Alekseev, Dmitry Tonevitsky, Alexander PeerJ Bioinformatics Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is an exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This method together with its optimizations were actively used in biomedical research, however, publicly available implementation is missing. We present ExhauFS—the user-friendly command-line implementation of the exhaustive search approach for classification and survival regression. Aside from tool description, we included three application examples in the manuscript to comprehensively review the implemented functionality. First, we executed ExhauFS on a toy cervical cancer dataset to illustrate basic concepts. Then, multi-cohort microarray breast cancer datasets were used to construct gene signatures for 5-year recurrence classification. The vast majority of signatures constructed by ExhauFS passed 0.65 threshold of sensitivity and specificity on all datasets, including the validation one. Moreover, a number of gene signatures demonstrated reliable performance on independent RNA-seq dataset without any coefficient re-tuning, i.e., turned out to be cross-platform. Finally, Cox survival regression models were used to fit isomiR signatures for overall survival prediction for patients with colorectal cancer. Similarly to the previous example, the major part of models passed the pre-defined concordance index threshold 0.65 on all datasets. In both real-world scenarios (breast and colorectal cancer datasets), ExhauFS was benchmarked against state-of-the-art feature selection models, including L(1)-regularized sparse models. In case of breast cancer, we were unable to construct reliable cross-platform classifiers using alternative feature selection approaches. In case of colorectal cancer not a single model passed the same 0.65 threshold. Source codes and documentation of ExhauFS are available on GitHub: https://github.com/s-a-nersisyan/ExhauFS. PeerJ Inc. 2022-03-30 /pmc/articles/PMC8976470/ /pubmed/35378930 http://dx.doi.org/10.7717/peerj.13200 Text en ©2022 Nersisyan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Nersisyan, Stepan
Novosad, Victor
Galatenko, Alexei
Sokolov, Andrey
Bokov, Grigoriy
Konovalov, Alexander
Alekseev, Dmitry
Tonevitsky, Alexander
ExhauFS: exhaustive search-based feature selection for classification and survival regression
title ExhauFS: exhaustive search-based feature selection for classification and survival regression
title_full ExhauFS: exhaustive search-based feature selection for classification and survival regression
title_fullStr ExhauFS: exhaustive search-based feature selection for classification and survival regression
title_full_unstemmed ExhauFS: exhaustive search-based feature selection for classification and survival regression
title_short ExhauFS: exhaustive search-based feature selection for classification and survival regression
title_sort exhaufs: exhaustive search-based feature selection for classification and survival regression
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8976470/
https://www.ncbi.nlm.nih.gov/pubmed/35378930
http://dx.doi.org/10.7717/peerj.13200
work_keys_str_mv AT nersisyanstepan exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT novosadvictor exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT galatenkoalexei exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT sokolovandrey exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT bokovgrigoriy exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT konovalovalexander exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT alekseevdmitry exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression
AT tonevitskyalexander exhaufsexhaustivesearchbasedfeatureselectionforclassificationandsurvivalregression