Cargando…

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huang, Xu, Zhang, Hong, Zhai, Xiaomeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/ https://www.ncbi.nlm.nih.gov/pubmed/35957487 http://dx.doi.org/10.3390/s22155930

_version_	1784767133648945152
author	Huang, Xu Zhang, Hong Zhai, Xiaomeng
author_facet	Huang, Xu Zhang, Hong Zhai, Xiaomeng
author_sort	Huang, Xu
collection	PubMed
description	Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.
format	Online Article Text
id	pubmed-9371413
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-93714132022-08-12 A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization Huang, Xu Zhang, Hong Zhai, Xiaomeng Sensors (Basel) Article Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications. MDPI 2022-08-08 /pmc/articles/PMC9371413/ /pubmed/35957487 http://dx.doi.org/10.3390/s22155930 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Huang, Xu Zhang, Hong Zhai, Xiaomeng A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title	A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_full	A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_fullStr	A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_full_unstemmed	A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_short	A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
title_sort	novel reinforcement learning approach for spark configuration parameter optimization
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371413/ https://www.ncbi.nlm.nih.gov/pubmed/35957487 http://dx.doi.org/10.3390/s22155930
work_keys_str_mv	AT huangxu anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhanghong anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhaixiaomeng anovelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT huangxu novelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhanghong novelreinforcementlearningapproachforsparkconfigurationparameteroptimization AT zhaixiaomeng novelreinforcementlearningapproachforsparkconfigurationparameteroptimization

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization

Ejemplares similares