Cargando…
Computing Resource Optimization for a Log Monitoring System
A Large Ion Collider Experiment (ALICE) at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN) laboratory was built to study heavy-ion collisions and the properties of the quark-gluon plasma. The Online and Offline (O2) software systems of the experiment generate...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1109/ICKII55100.2022.9983580 http://cds.cern.ch/record/2846164 |
Sumario: | A Large Ion Collider Experiment (ALICE) at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN) laboratory was built to study heavy-ion collisions and the properties of the quark-gluon plasma. The Online and Offline (O2) software systems of the experiment generate a huge amount of log data that is used for monitoring to detect a potential system failure. Elasticsearch was selected as a log storage and search engine for the monitoring system. One of the main problems is how to allocate the computing resources for Elasticsearch while minimizing cost and satisfying performance thresholds, i.e., throughput). Moreover, lacking knowledge of the search engine's behavior makes it difficult to find the best configuration. The exhaustive search method is a potential approach for solving. However, it is not practical since it consumes a lot of time and computing resources. Due to the limited resources, Bayesian optimization is applied as a solution. The Bayesian method requires only a few samples to create a surrogate function that roughly represents the objective function, i.e., minimizing cost while satisfying the performance needs. Then, the method explores only the area where the optimal solution exists with a high probability. The results show that Bayesian optimization provides the optimal or near-optimal computing resource configuration for given benchmark experiments while requiring only about half of the evaluations compared to other methods, e.g., exhaustive search, regression, and machine learning. The impact of several acquisition functions and initial sample generators were studied in order to find the best solution. These insights can help system operators search for an optimal computing resource configuration quickly and efficiently. |
---|