Cargando…
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficienc...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6027449/ https://www.ncbi.nlm.nih.gov/pubmed/29914084 http://dx.doi.org/10.3390/genes9060301 |
_version_ | 1783336614363136000 |
---|---|
author | Chen, Qi Meng, Zhaopeng Liu, Xinyi Jin, Qianguo Su, Ran |
author_facet | Chen, Qi Meng, Zhaopeng Liu, Xinyi Jin, Qianguo Su, Ran |
author_sort | Chen, Qi |
collection | PubMed |
description | Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE. |
format | Online Article Text |
id | pubmed-6027449 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-60274492018-07-13 Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE Chen, Qi Meng, Zhaopeng Liu, Xinyi Jin, Qianguo Su, Ran Genes (Basel) Article Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE. MDPI 2018-06-15 /pmc/articles/PMC6027449/ /pubmed/29914084 http://dx.doi.org/10.3390/genes9060301 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Chen, Qi Meng, Zhaopeng Liu, Xinyi Jin, Qianguo Su, Ran Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title | Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title_full | Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title_fullStr | Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title_full_unstemmed | Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title_short | Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE |
title_sort | decision variants for the automatic determination of optimal feature subset in rf-rfe |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6027449/ https://www.ncbi.nlm.nih.gov/pubmed/29914084 http://dx.doi.org/10.3390/genes9060301 |
work_keys_str_mv | AT chenqi decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe AT mengzhaopeng decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe AT liuxinyi decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe AT jinqianguo decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe AT suran decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe |