Cargando…

Automating usability of ATLAS Distributed Computing resources

The automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resour...

Descripción completa

Detalles Bibliográficos
Autores principales: Tupputi, S A, Di Girolamo, A, Kouba, T, Schovancova, J
Lenguaje:eng
Publicado: 2013
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/3/032098
http://cds.cern.ch/record/1621951
_version_ 1780933217660960768
author Tupputi, S A
Di Girolamo, A
Kouba, T
Schovancova, J
author_facet Tupputi, S A
Di Girolamo, A
Kouba, T
Schovancova, J
author_sort Tupputi, S A
collection CERN
description The automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resources, which are continuously exploited at the edge of their\ncapabilities. It is challenging to adopt unambiguous decision criteria for storage resources\nwho feature non-homogeneous types, sizes and roles. The recently developed Storage Area\nAutomatic Blacklisting (SAAB) tool has provided a suitable solution, by employing an\ninference algorithm which processes SAM (Site Availability Test) site-by-site SRM tests\noutcome. SAAB accomplishes both the tasks of providing global monitoring as well as\nautomatic operations on single sites. The implementation of the SAAB tool has been the first\nstep in a comprehensive review of the storage areas monitoring and central management at\nall levels. Such review has involved the reordering and optimization of SAM tests deployment \nand the inclusion of SAAB results in the ATLAS Site Status Board with both dedicated metrics\nand views. The final structure allows monitoring the storage resources statuses with fine\ntime-granularity and automatic actions to be taken in foreseen cases, like automatic\nexclusion/recovery and notifications to sites. Hence, the human actions are restricted to\ntickets tracking and exchanging, where and when needed. In this work we show SAAB\nworking principles and features. We present also the decrease of human interactions\nachieved within the ATLAS Computing Operation team. The automation results in a prompt\nreaction to failures, which grants the optimization of resource exploitation.
id cern-1621951
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2013
record_format invenio
spelling cern-16219512019-09-30T06:29:59Zdoi:10.1088/1742-6596/513/3/032098http://cds.cern.ch/record/1621951engTupputi, S ADi Girolamo, AKouba, TSchovancova, JAutomating usability of ATLAS Distributed Computing resourcesDetectors and Experimental TechniquesThe automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resources, which are continuously exploited at the edge of their\ncapabilities. It is challenging to adopt unambiguous decision criteria for storage resources\nwho feature non-homogeneous types, sizes and roles. The recently developed Storage Area\nAutomatic Blacklisting (SAAB) tool has provided a suitable solution, by employing an\ninference algorithm which processes SAM (Site Availability Test) site-by-site SRM tests\noutcome. SAAB accomplishes both the tasks of providing global monitoring as well as\nautomatic operations on single sites. The implementation of the SAAB tool has been the first\nstep in a comprehensive review of the storage areas monitoring and central management at\nall levels. Such review has involved the reordering and optimization of SAM tests deployment \nand the inclusion of SAAB results in the ATLAS Site Status Board with both dedicated metrics\nand views. The final structure allows monitoring the storage resources statuses with fine\ntime-granularity and automatic actions to be taken in foreseen cases, like automatic\nexclusion/recovery and notifications to sites. Hence, the human actions are restricted to\ntickets tracking and exchanging, where and when needed. In this work we show SAAB\nworking principles and features. We present also the decrease of human interactions\nachieved within the ATLAS Computing Operation team. The automation results in a prompt\nreaction to failures, which grants the optimization of resource exploitation.ATL-SOFT-PROC-2013-035oai:cds.cern.ch:16219512013-10-29
spellingShingle Detectors and Experimental Techniques
Tupputi, S A
Di Girolamo, A
Kouba, T
Schovancova, J
Automating usability of ATLAS Distributed Computing resources
title Automating usability of ATLAS Distributed Computing resources
title_full Automating usability of ATLAS Distributed Computing resources
title_fullStr Automating usability of ATLAS Distributed Computing resources
title_full_unstemmed Automating usability of ATLAS Distributed Computing resources
title_short Automating usability of ATLAS Distributed Computing resources
title_sort automating usability of atlas distributed computing resources
topic Detectors and Experimental Techniques
url https://dx.doi.org/10.1088/1742-6596/513/3/032098
http://cds.cern.ch/record/1621951
work_keys_str_mv AT tupputisa automatingusabilityofatlasdistributedcomputingresources
AT digirolamoa automatingusabilityofatlasdistributedcomputingresources
AT koubat automatingusabilityofatlasdistributedcomputingresources
AT schovancovaj automatingusabilityofatlasdistributedcomputingresources