Cargando…
Operational Intelligence for Distributed Computing Systems for Exascale Science
In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, b...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1051/epjconf/202024503017 http://cds.cern.ch/record/2758808 |
_version_ | 1780970192759685120 |
---|---|
author | Di Girolamo, Alessandro Legger, Federica Paparrigopoulos, Panos Klimentov, Alexei Schovancová, Jaroslava Kuznetsov, Valentin Lassnig, Mario Clissa, Luca Rinaldi, Lorenzo Sharma, Mayank Bakhshiansohi, Hamed Zvada, Marian Bonacorsi, Daniele Tisbeni, Simone Rossi Giommi, Luca Decker De Sousa, Leticia Diotalevi, Tommaso Grigorieva, Maria Padolski, Sergey |
author_facet | Di Girolamo, Alessandro Legger, Federica Paparrigopoulos, Panos Klimentov, Alexei Schovancová, Jaroslava Kuznetsov, Valentin Lassnig, Mario Clissa, Luca Rinaldi, Lorenzo Sharma, Mayank Bakhshiansohi, Hamed Zvada, Marian Bonacorsi, Daniele Tisbeni, Simone Rossi Giommi, Luca Decker De Sousa, Leticia Diotalevi, Tommaso Grigorieva, Maria Padolski, Sergey |
author_sort | Di Girolamo, Alessandro |
collection | CERN |
description | In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. A wealth of operational data can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure. |
id | oai-inspirehep.net-1831510 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2020 |
record_format | invenio |
spelling | oai-inspirehep.net-18315102021-03-25T22:33:55Zdoi:10.1051/epjconf/202024503017http://cds.cern.ch/record/2758808engDi Girolamo, AlessandroLegger, FedericaPaparrigopoulos, PanosKlimentov, AlexeiSchovancová, JaroslavaKuznetsov, ValentinLassnig, MarioClissa, LucaRinaldi, LorenzoSharma, MayankBakhshiansohi, HamedZvada, MarianBonacorsi, DanieleTisbeni, Simone RossiGiommi, LucaDecker De Sousa, LeticiaDiotalevi, TommasoGrigorieva, MariaPadolski, SergeyOperational Intelligence for Distributed Computing Systems for Exascale ScienceComputing and ComputersIn the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. A wealth of operational data can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.oai:inspirehep.net:18315102020 |
spellingShingle | Computing and Computers Di Girolamo, Alessandro Legger, Federica Paparrigopoulos, Panos Klimentov, Alexei Schovancová, Jaroslava Kuznetsov, Valentin Lassnig, Mario Clissa, Luca Rinaldi, Lorenzo Sharma, Mayank Bakhshiansohi, Hamed Zvada, Marian Bonacorsi, Daniele Tisbeni, Simone Rossi Giommi, Luca Decker De Sousa, Leticia Diotalevi, Tommaso Grigorieva, Maria Padolski, Sergey Operational Intelligence for Distributed Computing Systems for Exascale Science |
title | Operational Intelligence for Distributed Computing Systems for Exascale Science |
title_full | Operational Intelligence for Distributed Computing Systems for Exascale Science |
title_fullStr | Operational Intelligence for Distributed Computing Systems for Exascale Science |
title_full_unstemmed | Operational Intelligence for Distributed Computing Systems for Exascale Science |
title_short | Operational Intelligence for Distributed Computing Systems for Exascale Science |
title_sort | operational intelligence for distributed computing systems for exascale science |
topic | Computing and Computers |
url | https://dx.doi.org/10.1051/epjconf/202024503017 http://cds.cern.ch/record/2758808 |
work_keys_str_mv | AT digirolamoalessandro operationalintelligencefordistributedcomputingsystemsforexascalescience AT leggerfederica operationalintelligencefordistributedcomputingsystemsforexascalescience AT paparrigopoulospanos operationalintelligencefordistributedcomputingsystemsforexascalescience AT klimentovalexei operationalintelligencefordistributedcomputingsystemsforexascalescience AT schovancovajaroslava operationalintelligencefordistributedcomputingsystemsforexascalescience AT kuznetsovvalentin operationalintelligencefordistributedcomputingsystemsforexascalescience AT lassnigmario operationalintelligencefordistributedcomputingsystemsforexascalescience AT clissaluca operationalintelligencefordistributedcomputingsystemsforexascalescience AT rinaldilorenzo operationalintelligencefordistributedcomputingsystemsforexascalescience AT sharmamayank operationalintelligencefordistributedcomputingsystemsforexascalescience AT bakhshiansohihamed operationalintelligencefordistributedcomputingsystemsforexascalescience AT zvadamarian operationalintelligencefordistributedcomputingsystemsforexascalescience AT bonacorsidaniele operationalintelligencefordistributedcomputingsystemsforexascalescience AT tisbenisimonerossi operationalintelligencefordistributedcomputingsystemsforexascalescience AT giommiluca operationalintelligencefordistributedcomputingsystemsforexascalescience AT deckerdesousaleticia operationalintelligencefordistributedcomputingsystemsforexascalescience AT diotalevitommaso operationalintelligencefordistributedcomputingsystemsforexascalescience AT grigorievamaria operationalintelligencefordistributedcomputingsystemsforexascalescience AT padolskisergey operationalintelligencefordistributedcomputingsystemsforexascalescience |