Cargando…

Operational Intelligence for Distributed Computing Systems for Exascale Science

In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, b...

Descripción completa

Detalles Bibliográficos
Autores principales: Di Girolamo, Alessandro, Legger, Federica, Paparrigopoulos, Panos, Klimentov, Alexei, Schovancová, Jaroslava, Kuznetsov, Valentin, Lassnig, Mario, Clissa, Luca, Rinaldi, Lorenzo, Sharma, Mayank, Bakhshiansohi, Hamed, Zvada, Marian, Bonacorsi, Daniele, Tisbeni, Simone Rossi, Giommi, Luca, Decker De Sousa, Leticia, Diotalevi, Tommaso, Grigorieva, Maria, Padolski, Sergey
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/202024503017
http://cds.cern.ch/record/2758808
_version_ 1780970192759685120
author Di Girolamo, Alessandro
Legger, Federica
Paparrigopoulos, Panos
Klimentov, Alexei
Schovancová, Jaroslava
Kuznetsov, Valentin
Lassnig, Mario
Clissa, Luca
Rinaldi, Lorenzo
Sharma, Mayank
Bakhshiansohi, Hamed
Zvada, Marian
Bonacorsi, Daniele
Tisbeni, Simone Rossi
Giommi, Luca
Decker De Sousa, Leticia
Diotalevi, Tommaso
Grigorieva, Maria
Padolski, Sergey
author_facet Di Girolamo, Alessandro
Legger, Federica
Paparrigopoulos, Panos
Klimentov, Alexei
Schovancová, Jaroslava
Kuznetsov, Valentin
Lassnig, Mario
Clissa, Luca
Rinaldi, Lorenzo
Sharma, Mayank
Bakhshiansohi, Hamed
Zvada, Marian
Bonacorsi, Daniele
Tisbeni, Simone Rossi
Giommi, Luca
Decker De Sousa, Leticia
Diotalevi, Tommaso
Grigorieva, Maria
Padolski, Sergey
author_sort Di Girolamo, Alessandro
collection CERN
description In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. A wealth of operational data can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.
id oai-inspirehep.net-1831510
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling oai-inspirehep.net-18315102021-03-25T22:33:55Zdoi:10.1051/epjconf/202024503017http://cds.cern.ch/record/2758808engDi Girolamo, AlessandroLegger, FedericaPaparrigopoulos, PanosKlimentov, AlexeiSchovancová, JaroslavaKuznetsov, ValentinLassnig, MarioClissa, LucaRinaldi, LorenzoSharma, MayankBakhshiansohi, HamedZvada, MarianBonacorsi, DanieleTisbeni, Simone RossiGiommi, LucaDecker De Sousa, LeticiaDiotalevi, TommasoGrigorieva, MariaPadolski, SergeyOperational Intelligence for Distributed Computing Systems for Exascale ScienceComputing and ComputersIn the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. A wealth of operational data can be exploited to increase the level of automation in computing operations by using adequate techniques, such as machine learning (ML), tailored to solve specific problems. The Operational Intelligence project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations. We discuss how state-of-the-art technologies can be used to build general solutions to common problems and to reduce the operational cost of the experiment computing infrastructure.oai:inspirehep.net:18315102020
spellingShingle Computing and Computers
Di Girolamo, Alessandro
Legger, Federica
Paparrigopoulos, Panos
Klimentov, Alexei
Schovancová, Jaroslava
Kuznetsov, Valentin
Lassnig, Mario
Clissa, Luca
Rinaldi, Lorenzo
Sharma, Mayank
Bakhshiansohi, Hamed
Zvada, Marian
Bonacorsi, Daniele
Tisbeni, Simone Rossi
Giommi, Luca
Decker De Sousa, Leticia
Diotalevi, Tommaso
Grigorieva, Maria
Padolski, Sergey
Operational Intelligence for Distributed Computing Systems for Exascale Science
title Operational Intelligence for Distributed Computing Systems for Exascale Science
title_full Operational Intelligence for Distributed Computing Systems for Exascale Science
title_fullStr Operational Intelligence for Distributed Computing Systems for Exascale Science
title_full_unstemmed Operational Intelligence for Distributed Computing Systems for Exascale Science
title_short Operational Intelligence for Distributed Computing Systems for Exascale Science
title_sort operational intelligence for distributed computing systems for exascale science
topic Computing and Computers
url https://dx.doi.org/10.1051/epjconf/202024503017
http://cds.cern.ch/record/2758808
work_keys_str_mv AT digirolamoalessandro operationalintelligencefordistributedcomputingsystemsforexascalescience
AT leggerfederica operationalintelligencefordistributedcomputingsystemsforexascalescience
AT paparrigopoulospanos operationalintelligencefordistributedcomputingsystemsforexascalescience
AT klimentovalexei operationalintelligencefordistributedcomputingsystemsforexascalescience
AT schovancovajaroslava operationalintelligencefordistributedcomputingsystemsforexascalescience
AT kuznetsovvalentin operationalintelligencefordistributedcomputingsystemsforexascalescience
AT lassnigmario operationalintelligencefordistributedcomputingsystemsforexascalescience
AT clissaluca operationalintelligencefordistributedcomputingsystemsforexascalescience
AT rinaldilorenzo operationalintelligencefordistributedcomputingsystemsforexascalescience
AT sharmamayank operationalintelligencefordistributedcomputingsystemsforexascalescience
AT bakhshiansohihamed operationalintelligencefordistributedcomputingsystemsforexascalescience
AT zvadamarian operationalintelligencefordistributedcomputingsystemsforexascalescience
AT bonacorsidaniele operationalintelligencefordistributedcomputingsystemsforexascalescience
AT tisbenisimonerossi operationalintelligencefordistributedcomputingsystemsforexascalescience
AT giommiluca operationalintelligencefordistributedcomputingsystemsforexascalescience
AT deckerdesousaleticia operationalintelligencefordistributedcomputingsystemsforexascalescience
AT diotalevitommaso operationalintelligencefordistributedcomputingsystemsforexascalescience
AT grigorievamaria operationalintelligencefordistributedcomputingsystemsforexascalescience
AT padolskisergey operationalintelligencefordistributedcomputingsystemsforexascalescience