Cargando…
A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the la...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/ https://www.ncbi.nlm.nih.gov/pubmed/35957487 http://dx.doi.org/10.3390/s22155930 |
_version_ | 1784767133648945152 |
---|---|
author | Huang, Xu Zhang, Hong Zhai, Xiaomeng |
author_facet | Huang, Xu Zhang, Hong Zhai, Xiaomeng |
author_sort | Huang, Xu |
collection | PubMed |
description | Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications. |
format | Online Article Text |
id | pubmed-9371413 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-93714132022-08-12 A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization Huang, Xu Zhang, Hong Zhai, Xiaomeng Sensors (Basel) Article Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications. MDPI 2022-08-08 /pmc/articles/PMC9371413/ /pubmed/35957487 http://dx.doi.org/10.3390/s22155930 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Huang, Xu Zhang, Hong Zhai, Xiaomeng A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title | A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title_full | A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title_fullStr | A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title_full_unstemmed | A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title_short | A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization |
title_sort | novel reinforcement learning approach for spark configuration parameter optimization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/ https://www.ncbi.nlm.nih.gov/pubmed/35957487 http://dx.doi.org/10.3390/s22155930 |
work_keys_str_mv | AT huangxu anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhanghong anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhaixiaomeng anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT huangxu novelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhanghong novelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhaixiaomeng novelreinforcementlearningapproachforsparkconfigurationparameteroptimization |