Cargando…
Improving ATLAS computing resource utilization with HammerCloud
HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automate...
Autores principales: | , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2625218 |
_version_ | 1780958783432818688 |
---|---|
author | Schovancova, Jaroslava Buehrer, Felix Caballero-Bejar, Jose Duckeck, Guenter Fkiaras, Aristeidis Legger, Federica Maier, Thomas Mancinelli, Valentina Sciacca, Francesco Giovanni Yusta Espla, Antonio |
author_facet | Schovancova, Jaroslava Buehrer, Felix Caballero-Bejar, Jose Duckeck, Guenter Fkiaras, Aristeidis Legger, Federica Maier, Thomas Mancinelli, Valentina Sciacca, Francesco Giovanni Yusta Espla, Antonio |
author_sort | Schovancova, Jaroslava |
collection | CERN |
description | HammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which have yet to be automated, and improve utilization of available computing resources. We present recent evolution of the auto-exclusion/recovery tools: faster inclusion of new resources in testing machinery, machine learning algorithms for anomaly detection, categorized resources as master vs. slave for the purpose of blacklisting, and a tool for auto-exclusion/recovery of resources triggered by Event Service job failures that is being extended to other workflows besides the Event Service. We describe how HammerCloud helped commissioning various concepts and components of distributed systems: simplified configuration of queues for workflows of different activities (unified queues), components of Pilot (new movers), components of AGIS (controller), distributed data management system (protocols, direct data access, ObjectStore tests). We summarize updates that brought HammerCloud up to date with developments in ADC and improved its flexibility to adapt to the new activities and workflows to respond to evolving needs of the ADC Operations team in a timely manner. |
id | cern-2625218 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2018 |
record_format | invenio |
spelling | cern-26252182019-09-30T06:29:59Zhttp://cds.cern.ch/record/2625218engSchovancova, JaroslavaBuehrer, FelixCaballero-Bejar, JoseDuckeck, GuenterFkiaras, AristeidisLegger, FedericaMaier, ThomasMancinelli, ValentinaSciacca, Francesco GiovanniYusta Espla, AntonioImproving ATLAS computing resource utilization with HammerCloudParticle Physics - ExperimentHammerCloud is a framework to commission, test, and benchmark ATLAS computing resources and components of various distributed systems with realistic full-chain experiment workflows. HammerCloud contributes to ATLAS Distributed Computing (ADC) Operations and automation efforts, providing the automated resource exclusion and recovery tools, that help re-focus operational manpower to areas which have yet to be automated, and improve utilization of available computing resources. We present recent evolution of the auto-exclusion/recovery tools: faster inclusion of new resources in testing machinery, machine learning algorithms for anomaly detection, categorized resources as master vs. slave for the purpose of blacklisting, and a tool for auto-exclusion/recovery of resources triggered by Event Service job failures that is being extended to other workflows besides the Event Service. We describe how HammerCloud helped commissioning various concepts and components of distributed systems: simplified configuration of queues for workflows of different activities (unified queues), components of Pilot (new movers), components of AGIS (controller), distributed data management system (protocols, direct data access, ObjectStore tests). We summarize updates that brought HammerCloud up to date with developments in ADC and improved its flexibility to adapt to the new activities and workflows to respond to evolving needs of the ADC Operations team in a timely manner.ATL-SOFT-SLIDE-2018-392oai:cds.cern.ch:26252182018-06-21 |
spellingShingle | Particle Physics - Experiment Schovancova, Jaroslava Buehrer, Felix Caballero-Bejar, Jose Duckeck, Guenter Fkiaras, Aristeidis Legger, Federica Maier, Thomas Mancinelli, Valentina Sciacca, Francesco Giovanni Yusta Espla, Antonio Improving ATLAS computing resource utilization with HammerCloud |
title | Improving ATLAS computing resource utilization with HammerCloud |
title_full | Improving ATLAS computing resource utilization with HammerCloud |
title_fullStr | Improving ATLAS computing resource utilization with HammerCloud |
title_full_unstemmed | Improving ATLAS computing resource utilization with HammerCloud |
title_short | Improving ATLAS computing resource utilization with HammerCloud |
title_sort | improving atlas computing resource utilization with hammercloud |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2625218 |
work_keys_str_mv | AT schovancovajaroslava improvingatlascomputingresourceutilizationwithhammercloud AT buehrerfelix improvingatlascomputingresourceutilizationwithhammercloud AT caballerobejarjose improvingatlascomputingresourceutilizationwithhammercloud AT duckeckguenter improvingatlascomputingresourceutilizationwithhammercloud AT fkiarasaristeidis improvingatlascomputingresourceutilizationwithhammercloud AT leggerfederica improvingatlascomputingresourceutilizationwithhammercloud AT maierthomas improvingatlascomputingresourceutilizationwithhammercloud AT mancinellivalentina improvingatlascomputingresourceutilizationwithhammercloud AT sciaccafrancescogiovanni improvingatlascomputingresourceutilizationwithhammercloud AT yustaesplaantonio improvingatlascomputingresourceutilizationwithhammercloud |