Cargando…

Improving ATLAS computing resource utilization with HammerCloud

HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automate...

Descripción completa

Detalles Bibliográficos
Autores principales: Schovancova, Jaroslava, Buehrer, Felix, Caballero-Bejar, Jose, Duckeck, Guenter, Fkiaras, Aristeidis, Legger, Federica, Maier, Thomas, Mancinelli, Valentina, Sciacca, Francesco Giovanni, Yusta Espla, Antonio
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2625218
_version_ 1780958783432818688
author Schovancova, Jaroslava
Buehrer, Felix
Caballero-Bejar, Jose
Duckeck, Guenter
Fkiaras, Aristeidis
Legger, Federica
Maier, Thomas
Mancinelli, Valentina
Sciacca, Francesco Giovanni
Yusta Espla, Antonio
author_facet Schovancova, Jaroslava
Buehrer, Felix
Caballero-Bejar, Jose
Duckeck, Guenter
Fkiaras, Aristeidis
Legger, Federica
Maier, Thomas
Mancinelli, Valentina
Sciacca, Francesco Giovanni
Yusta Espla, Antonio
author_sort Schovancova, Jaroslava
collection CERN
description HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which have yet to be automated, and improve utilization of available computing resources. We present recent evolution of the auto-exclusion/recovery tools: faster inclusion of new resources in testing machinery, machine learning algorithms for anomaly detection, categorized resources as master vs. slave for the purpose of blacklisting, and a tool for auto-exclusion/recovery of resources triggered by Event Service job failures that is being extended to other workflows besides the Event Service. We describe how HammerCloud helped commissioning various concepts and components of distributed systems: simplified configuration of queues for workflows of different activities (unified queues), components of Pilot (new movers), components of AGIS (controller), distributed data management system (protocols, direct data access, ObjectStore tests). We summarize updates that brought HammerCloud up to date with developments in ADC and improved its flexibility to adapt to the new activities and workflows to respond to evolving needs of the ADC Operations team in a timely manner.
id cern-2625218
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-26252182019-09-30T06:29:59Zhttp://cds.cern.ch/record/2625218engSchovancova, JaroslavaBuehrer, FelixCaballero-Bejar, JoseDuckeck, GuenterFkiaras, AristeidisLegger, FedericaMaier, ThomasMancinelli, ValentinaSciacca, Francesco GiovanniYusta Espla, AntonioImproving ATLAS computing resource utilization with HammerCloudParticle Physics - ExperimentHammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which have yet to be automated, and improve utilization of available computing resources. We present recent evolution of the auto-exclusion/recovery tools: faster inclusion of new resources in testing machinery, machine learning algorithms for anomaly detection, categorized resources as master vs. slave for the purpose of blacklisting, and a tool for auto-exclusion/recovery of resources triggered by Event Service job failures that is being extended to other workflows besides the Event Service. We describe how HammerCloud helped commissioning various concepts and components of distributed systems: simplified configuration of queues for workflows of different activities (unified queues), components of Pilot (new movers), components of AGIS (controller), distributed data management system (protocols, direct data access, ObjectStore tests). We summarize updates that brought HammerCloud up to date with developments in ADC and improved its flexibility to adapt to the new activities and workflows to respond to evolving needs of the ADC Operations team in a timely manner.ATL-SOFT-SLIDE-2018-392oai:cds.cern.ch:26252182018-06-21
spellingShingle Particle Physics - Experiment
Schovancova, Jaroslava
Buehrer, Felix
Caballero-Bejar, Jose
Duckeck, Guenter
Fkiaras, Aristeidis
Legger, Federica
Maier, Thomas
Mancinelli, Valentina
Sciacca, Francesco Giovanni
Yusta Espla, Antonio
Improving ATLAS computing resource utilization with HammerCloud
title Improving ATLAS computing resource utilization with HammerCloud
title_full Improving ATLAS computing resource utilization with HammerCloud
title_fullStr Improving ATLAS computing resource utilization with HammerCloud
title_full_unstemmed Improving ATLAS computing resource utilization with HammerCloud
title_short Improving ATLAS computing resource utilization with HammerCloud
title_sort improving atlas computing resource utilization with hammercloud
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2625218
work_keys_str_mv AT schovancovajaroslava improvingatlascomputingresourceutilizationwithhammercloud
AT buehrerfelix improvingatlascomputingresourceutilizationwithhammercloud
AT caballerobejarjose improvingatlascomputingresourceutilizationwithhammercloud
AT duckeckguenter improvingatlascomputingresourceutilizationwithhammercloud
AT fkiarasaristeidis improvingatlascomputingresourceutilizationwithhammercloud
AT leggerfederica improvingatlascomputingresourceutilizationwithhammercloud
AT maierthomas improvingatlascomputingresourceutilizationwithhammercloud
AT mancinellivalentina improvingatlascomputingresourceutilizationwithhammercloud
AT sciaccafrancescogiovanni improvingatlascomputingresourceutilizationwithhammercloud
AT yustaesplaantonio improvingatlascomputingresourceutilizationwithhammercloud