Cargando…
Automating usability of ATLAS Distributed Computing resources
The automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resour...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2013
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/513/3/032098 http://cds.cern.ch/record/1621951 |
_version_ | 1780933217660960768 |
---|---|
author | Tupputi, S A Di Girolamo, A Kouba, T Schovancova, J |
author_facet | Tupputi, S A Di Girolamo, A Kouba, T Schovancova, J |
author_sort | Tupputi, S A |
collection | CERN |
description | The automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resources, which are continuously exploited at the edge of their\ncapabilities. It is challenging to adopt unambiguous decision criteria for storage resources\nwho feature non-homogeneous types, sizes and roles. The recently developed Storage Area\nAutomatic Blacklisting (SAAB) tool has provided a suitable solution, by employing an\ninference algorithm which processes SAM (Site Availability Test) site-by-site SRM tests\noutcome. SAAB accomplishes both the tasks of providing global monitoring as well as\nautomatic operations on single sites. The implementation of the SAAB tool has been the first\nstep in a comprehensive review of the storage areas monitoring and central management at\nall levels. Such review has involved the reordering and optimization of SAM tests deployment \nand the inclusion of SAAB results in the ATLAS Site Status Board with both dedicated metrics\nand views. The final structure allows monitoring the storage resources statuses with fine\ntime-granularity and automatic actions to be taken in foreseen cases, like automatic\nexclusion/recovery and notifications to sites. Hence, the human actions are restricted to\ntickets tracking and exchanging, where and when needed. In this work we show SAAB\nworking principles and features. We present also the decrease of human interactions\nachieved within the ATLAS Computing Operation team. The automation results in a prompt\nreaction to failures, which grants the optimization of resource exploitation. |
id | cern-1621951 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2013 |
record_format | invenio |
spelling | cern-16219512019-09-30T06:29:59Zdoi:10.1088/1742-6596/513/3/032098http://cds.cern.ch/record/1621951engTupputi, S ADi Girolamo, AKouba, TSchovancova, JAutomating usability of ATLAS Distributed Computing resourcesDetectors and Experimental TechniquesThe automation of ATLAS Distributed Computing (ADC) operations is essential to reduce\nmanpower costs and allow performance-enhancing actions, which improve the reliability of\nthe system. In this perspective a crucial case is the automatic exclusion/recovery of ATLAS\ncomputing sites storage resources, which are continuously exploited at the edge of their\ncapabilities. It is challenging to adopt unambiguous decision criteria for storage resources\nwho feature non-homogeneous types, sizes and roles. The recently developed Storage Area\nAutomatic Blacklisting (SAAB) tool has provided a suitable solution, by employing an\ninference algorithm which processes SAM (Site Availability Test) site-by-site SRM tests\noutcome. SAAB accomplishes both the tasks of providing global monitoring as well as\nautomatic operations on single sites. The implementation of the SAAB tool has been the first\nstep in a comprehensive review of the storage areas monitoring and central management at\nall levels. Such review has involved the reordering and optimization of SAM tests deployment \nand the inclusion of SAAB results in the ATLAS Site Status Board with both dedicated metrics\nand views. The final structure allows monitoring the storage resources statuses with fine\ntime-granularity and automatic actions to be taken in foreseen cases, like automatic\nexclusion/recovery and notifications to sites. Hence, the human actions are restricted to\ntickets tracking and exchanging, where and when needed. In this work we show SAAB\nworking principles and features. We present also the decrease of human interactions\nachieved within the ATLAS Computing Operation team. The automation results in a prompt\nreaction to failures, which grants the optimization of resource exploitation.ATL-SOFT-PROC-2013-035oai:cds.cern.ch:16219512013-10-29 |
spellingShingle | Detectors and Experimental Techniques Tupputi, S A Di Girolamo, A Kouba, T Schovancova, J Automating usability of ATLAS Distributed Computing resources |
title | Automating usability of ATLAS Distributed Computing resources |
title_full | Automating usability of ATLAS Distributed Computing resources |
title_fullStr | Automating usability of ATLAS Distributed Computing resources |
title_full_unstemmed | Automating usability of ATLAS Distributed Computing resources |
title_short | Automating usability of ATLAS Distributed Computing resources |
title_sort | automating usability of atlas distributed computing resources |
topic | Detectors and Experimental Techniques |
url | https://dx.doi.org/10.1088/1742-6596/513/3/032098 http://cds.cern.ch/record/1621951 |
work_keys_str_mv | AT tupputisa automatingusabilityofatlasdistributedcomputingresources AT digirolamoa automatingusabilityofatlasdistributedcomputingresources AT koubat automatingusabilityofatlasdistributedcomputingresources AT schovancovaj automatingusabilityofatlasdistributedcomputingresources |