Cargando…

Preparing distributed computing operations for the HLLHC era with Operational Intelligence

<!--HTML-->The Operational Intelligence (OpInt) project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the exper...

Descripción completa

Detalles Bibliográficos
Autor principal: Paparrigopoulos, Panos
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2766889
Descripción
Sumario:<!--HTML-->The Operational Intelligence (OpInt) project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the experiments goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters and operational teams is needed to manage efficiently such heterogeneous infrastructures. Under the scope of the OpInt project, experts from most of the relevant areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases . Discussions have led to a number of ideas on how to achieve our goals and the development of solutions has started. In this contribution, we will report on the development of a suite of OpInt services to cover various use cases of: workload management, data management, and site operations.