Cargando…

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficienc...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Qi, Meng, Zhaopeng, Liu, Xinyi, Jin, Qianguo, Su, Ran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6027449/
https://www.ncbi.nlm.nih.gov/pubmed/29914084
http://dx.doi.org/10.3390/genes9060301
_version_ 1783336614363136000
author Chen, Qi
Meng, Zhaopeng
Liu, Xinyi
Jin, Qianguo
Su, Ran
author_facet Chen, Qi
Meng, Zhaopeng
Liu, Xinyi
Jin, Qianguo
Su, Ran
author_sort Chen, Qi
collection PubMed
description Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.
format Online
Article
Text
id pubmed-6027449
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-60274492018-07-13 Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE Chen, Qi Meng, Zhaopeng Liu, Xinyi Jin, Qianguo Su, Ran Genes (Basel) Article Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE. MDPI 2018-06-15 /pmc/articles/PMC6027449/ /pubmed/29914084 http://dx.doi.org/10.3390/genes9060301 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Qi
Meng, Zhaopeng
Liu, Xinyi
Jin, Qianguo
Su, Ran
Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title_full Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title_fullStr Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title_full_unstemmed Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title_short Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
title_sort decision variants for the automatic determination of optimal feature subset in rf-rfe
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6027449/
https://www.ncbi.nlm.nih.gov/pubmed/29914084
http://dx.doi.org/10.3390/genes9060301
work_keys_str_mv AT chenqi decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe
AT mengzhaopeng decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe
AT liuxinyi decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe
AT jinqianguo decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe
AT suran decisionvariantsfortheautomaticdeterminationofoptimalfeaturesubsetinrfrfe