Cargando…
Preparing distributed computing operations for HL-LHC era with Operational Intelligence
The Operational Intelligence (OpInt) project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the experiment goals, by a...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2752591 |
Sumario: | The Operational Intelligence (OpInt) project is a joint effort from various WLCG communities aimed at increasing the level of automation in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastructures. Under the scope of the OpInt project experts from most of the relevant areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. Discussions have led to a number of ideas on how to achieve our goals and the development of solutions has started. In this contribution, we will report on the development of a suite of OpInt services to cover various use cases: workload management, data management, and site operations. |
---|