Cargando…

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the la...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Xu, Zhang, Hong, Zhai, Xiaomeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/
https://www.ncbi.nlm.nih.gov/pubmed/35957487
http://dx.doi.org/10.3390/s22155930
_version_ 1784767133648945152
author Huang, Xu
Zhang, Hong
Zhai, Xiaomeng
author_facet Huang, Xu
Zhang, Hong
Zhai, Xiaomeng
author_sort Huang, Xu
collection PubMed
description Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.
format Online
Article
Text
id pubmed-9371413
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93714132022-08-12 A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization Huang, Xu Zhang, Hong Zhai, Xiaomeng Sensors (Basel) Article Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications. MDPI 2022-08-08 /pmc/articles/PMC9371413/ /pubmed/35957487 http://dx.doi.org/10.3390/s22155930 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Huang, Xu
Zhang, Hong
Zhai, Xiaomeng
A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_full A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_fullStr A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_full_unstemmed A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_short A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_sort novel reinforcement learning approach for spark configuration parameter optimization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/
https://www.ncbi.nlm.nih.gov/pubmed/35957487
http://dx.doi.org/10.3390/s22155930
work_keys_str_mv AT huangxu anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization
AT zhanghong anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization
AT zhaixiaomeng anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization
AT huangxu novelreinforcementlearningapproachforsparkconfigurationparameteroptimization
AT zhanghong novelreinforcementlearningapproachforsparkconfigurationparameteroptimization
AT zhaixiaomeng novelreinforcementlearningapproachforsparkconfigurationparameteroptimization