Cargando…

Applying Big Data solutions for log analytics in the PanDA infrastructure

PanDA is the workflow management system of the ATLAS experiment at the LHC and is responsible for generating, brokering and monitoring up to two million jobs per day across 150 computing centers in the Worldwide LHC Computing Grid. The PanDA core consists of several components deployed centrally on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alekseev, Aleksandr, Barreiro, Fernando, Klimentov, Alexei, Korchuganova, Tatiana, Maeno, Tadashi, Padolski, Siarhei
Lenguaje:	eng
Publicado:	2017
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2285401

_version_	1780955875627761664
author	Alekseev, Aleksandr Barreiro, Fernando Klimentov, Alexei Korchuganova, Tatiana Maeno, Tadashi Padolski, Siarhei
author_facet	Alekseev, Aleksandr Barreiro, Fernando Klimentov, Alexei Korchuganova, Tatiana Maeno, Tadashi Padolski, Siarhei
author_sort	Alekseev, Aleksandr
collection	CERN
description	PanDA is the workflow management system of the ATLAS experiment at the LHC and is responsible for generating, brokering and monitoring up to two million jobs per day across 150 computing centers in the Worldwide LHC Computing Grid. The PanDA core consists of several components deployed centrally on around 20 servers. The daily log volume is around 400GB per day. In certain cases, troubleshooting a particular issue on the raw log files can be compared to searching for a needle in a haystack and requires a high level of expertise. Therefore we decided to build on trending Big Data solutions and utilize the ELK infrastructure (Filebeat, Logstash, Elastic Search and Kibana) to process, index and analyze our log files. This allows to overcome troubleshooting complexity, provides a better interface to the operations team and generates advanced analytics to understand our system. This paper will describe the features of the ELK stack, our infrastructure, optimal configuration settings and filters. We will provide examples of graphs and dashboards generated through the ELK system to demonstrate the potential of the system. Finally, we will show the current integration of Kibana with the PanDA monitoring frontend and other usage possibilities, such as proactive notification of exceptions in the system.
id	cern-2285401
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2017
record_format	invenio
spelling	cern-22854012019-09-30T06:29:59Zhttp://cds.cern.ch/record/2285401engAlekseev, AleksandrBarreiro, FernandoKlimentov, AlexeiKorchuganova, TatianaMaeno, TadashiPadolski, SiarheiApplying Big Data solutions for log analytics in the PanDA infrastructureParticle Physics - ExperimentPanDA is the workflow management system of the ATLAS experiment at the LHC and is responsible for generating, brokering and monitoring up to two million jobs per day across 150 computing centers in the Worldwide LHC Computing Grid. The PanDA core consists of several components deployed centrally on around 20 servers. The daily log volume is around 400GB per day. In certain cases, troubleshooting a particular issue on the raw log files can be compared to searching for a needle in a haystack and requires a high level of expertise. Therefore we decided to build on trending Big Data solutions and utilize the ELK infrastructure (Filebeat, Logstash, Elastic Search and Kibana) to process, index and analyze our log files. This allows to overcome troubleshooting complexity, provides a better interface to the operations team and generates advanced analytics to understand our system. This paper will describe the features of the ELK stack, our infrastructure, optimal configuration settings and filters. We will provide examples of graphs and dashboards generated through the ELK system to demonstrate the potential of the system. Finally, we will show the current integration of Kibana with the PanDA monitoring frontend and other usage possibilities, such as proactive notification of exceptions in the system.ATL-SOFT-SLIDE-2017-803oai:cds.cern.ch:22854012017-09-22
spellingShingle	Particle Physics - Experiment Alekseev, Aleksandr Barreiro, Fernando Klimentov, Alexei Korchuganova, Tatiana Maeno, Tadashi Padolski, Siarhei Applying Big Data solutions for log analytics in the PanDA infrastructure
title	Applying Big Data solutions for log analytics in the PanDA infrastructure
title_full	Applying Big Data solutions for log analytics in the PanDA infrastructure
title_fullStr	Applying Big Data solutions for log analytics in the PanDA infrastructure
title_full_unstemmed	Applying Big Data solutions for log analytics in the PanDA infrastructure
title_short	Applying Big Data solutions for log analytics in the PanDA infrastructure
title_sort	applying big data solutions for log analytics in the panda infrastructure
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2285401
work_keys_str_mv	AT alekseevaleksandr applyingbigdatasolutionsforloganalyticsinthepandainfrastructure AT barreirofernando applyingbigdatasolutionsforloganalyticsinthepandainfrastructure AT klimentovalexei applyingbigdatasolutionsforloganalyticsinthepandainfrastructure AT korchuganovatatiana applyingbigdatasolutionsforloganalyticsinthepandainfrastructure AT maenotadashi applyingbigdatasolutionsforloganalyticsinthepandainfrastructure AT padolskisiarhei applyingbigdatasolutionsforloganalyticsinthepandainfrastructure

Applying Big Data solutions for log analytics in the PanDA infrastructure

Ejemplares similares