Cargando…

Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning

Feature Engineering (FE) is one of the most beneficial, yet most difficult and time-consuming tasks of machine learning projects, and requires strong expert knowledge. It is thus significant to design generalized ways to perform FE. The primary difficulties arise from the multiform information to co...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jianyu, Hao, Jianye, Fogelman-Soulié, Françoise
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/
http://dx.doi.org/10.1007/978-3-030-47426-3_63
_version_ 1783530363059961856
author Zhang, Jianyu
Hao, Jianye
Fogelman-Soulié, Françoise
author_facet Zhang, Jianyu
Hao, Jianye
Fogelman-Soulié, Françoise
author_sort Zhang, Jianyu
collection PubMed
description Feature Engineering (FE) is one of the most beneficial, yet most difficult and time-consuming tasks of machine learning projects, and requires strong expert knowledge. It is thus significant to design generalized ways to perform FE. The primary difficulties arise from the multiform information to consider, the potentially infinite number of possible features and the high computational cost of feature generation and evaluation. We present a framework called Cross-data Automatic Feature Engineering Machine (CAFEM), which formalizes the FE problem as an optimization problem over a Feature Transformation Graph (FTG). CAFEM contains two components: a FE learner (FeL) that learns fine-grained FE strategies on one single dataset by Double Deep Q-learning (DDQN) and a Cross-data Component (CdC) that speeds up FE learning on an unseen dataset by the generalized FE policies learned by Meta-Learning on a collection of datasets. We compare the performance of FeL with several existing state-of-the-art automatic FE techniques on a large collection of datasets. It shows that FeL outperforms existing approaches and is robust on the selection of learning algorithms. Further experiments also show that CdC can not only speed up FE learning but also increase learning performance.
format Online
Article
Text
id pubmed-7206177
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72061772020-05-08 Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning Zhang, Jianyu Hao, Jianye Fogelman-Soulié, Françoise Advances in Knowledge Discovery and Data Mining Article Feature Engineering (FE) is one of the most beneficial, yet most difficult and time-consuming tasks of machine learning projects, and requires strong expert knowledge. It is thus significant to design generalized ways to perform FE. The primary difficulties arise from the multiform information to consider, the potentially infinite number of possible features and the high computational cost of feature generation and evaluation. We present a framework called Cross-data Automatic Feature Engineering Machine (CAFEM), which formalizes the FE problem as an optimization problem over a Feature Transformation Graph (FTG). CAFEM contains two components: a FE learner (FeL) that learns fine-grained FE strategies on one single dataset by Double Deep Q-learning (DDQN) and a Cross-data Component (CdC) that speeds up FE learning on an unseen dataset by the generalized FE policies learned by Meta-Learning on a collection of datasets. We compare the performance of FeL with several existing state-of-the-art automatic FE techniques on a large collection of datasets. It shows that FeL outperforms existing approaches and is robust on the selection of learning algorithms. Further experiments also show that CdC can not only speed up FE learning but also increase learning performance. 2020-04-17 /pmc/articles/PMC7206177/ http://dx.doi.org/10.1007/978-3-030-47426-3_63 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Zhang, Jianyu
Hao, Jianye
Fogelman-Soulié, Françoise
Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title_full Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title_fullStr Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title_full_unstemmed Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title_short Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning
title_sort cross-data automatic feature engineering via meta-learning and reinforcement learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/
http://dx.doi.org/10.1007/978-3-030-47426-3_63
work_keys_str_mv AT zhangjianyu crossdataautomaticfeatureengineeringviametalearningandreinforcementlearning
AT haojianye crossdataautomaticfeatureengineeringviametalearningandreinforcementlearning
AT fogelmansouliefrancoise crossdataautomaticfeatureengineeringviametalearningandreinforcementlearning