Cargando…

Scalable Data Processing Model of the ALICE Experiment in the Cloud

This thesis proposes an optimisation strategy for scalable Big Data processing in a heterogeneous Cloud. The resource needs of A Large Ion Collider Experiment (ALICE) at the European Organization for Nuclear Research (CERN) are reviewed as a motivating example. The thesis examines how to efficiently...

Descripción completa

Detalles Bibliográficos
Autor principal: Loncar, Petra
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2874778
_version_ 1780978845510270976
author Loncar, Petra
author_facet Loncar, Petra
author_sort Loncar, Petra
collection CERN
description This thesis proposes an optimisation strategy for scalable Big Data processing in a heterogeneous Cloud. The resource needs of A Large Ion Collider Experiment (ALICE) at the European Organization for Nuclear Research (CERN) are reviewed as a motivating example. The thesis examines how to efficiently process and optimise the processing of resource-intensive tasks on a heterogeneous Cloud infrastructure distributed in five data centres to meet the needs of the ALICE experiment at the Tier 2 level. The objective was to perform research on a much larger number of tasks and resources of a significantly larger capacity than prior studies, which focused on a smaller number of tasks and resources with a lower capacity. The proposed and developed processing model for ALICE Monte Carlo production is based on a centralised software-defined management approach for the use of heterogeneous resources. Algorithms for assigning tasks to heterogeneous virtual resources have been analysed and proposed. The proposed algorithms are based on the selected Evolution Strategies metaheuristic that has not yet been used in this domain, namely Evolution Strategies algorithm, Evolution Strategies algorithm with Longest Job First broker policy, and Evolution Strategies algorithm with Shortest Job First broker policy. The Cloud system model is implemented using the open-source CloudSim simulator. ALICE Monte Carlo production job requirements are imported into the simulation model as a workload created in Standard Workload Format (SWF) adapted for the Cloud simulator. The results of the simulation performance of the reference implementation under different loads were analysed and compared with the Genetic Algorithm from the same group of algorithms. The obtained results show multiple improvements. The proposed data processing model enables centralised software management of heterogeneous Cloud infrastructure, optimises measured metrics, improves resource usage, and achieves the system's scalability.
id cern-2874778
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28747782023-10-10T22:59:33Zhttp://cds.cern.ch/record/2874778engLoncar, PetraScalable Data Processing Model of the ALICE Experiment in the CloudComputing and ComputersThis thesis proposes an optimisation strategy for scalable Big Data processing in a heterogeneous Cloud. The resource needs of A Large Ion Collider Experiment (ALICE) at the European Organization for Nuclear Research (CERN) are reviewed as a motivating example. The thesis examines how to efficiently process and optimise the processing of resource-intensive tasks on a heterogeneous Cloud infrastructure distributed in five data centres to meet the needs of the ALICE experiment at the Tier 2 level. The objective was to perform research on a much larger number of tasks and resources of a significantly larger capacity than prior studies, which focused on a smaller number of tasks and resources with a lower capacity. The proposed and developed processing model for ALICE Monte Carlo production is based on a centralised software-defined management approach for the use of heterogeneous resources. Algorithms for assigning tasks to heterogeneous virtual resources have been analysed and proposed. The proposed algorithms are based on the selected Evolution Strategies metaheuristic that has not yet been used in this domain, namely Evolution Strategies algorithm, Evolution Strategies algorithm with Longest Job First broker policy, and Evolution Strategies algorithm with Shortest Job First broker policy. The Cloud system model is implemented using the open-source CloudSim simulator. ALICE Monte Carlo production job requirements are imported into the simulation model as a workload created in Standard Workload Format (SWF) adapted for the Cloud simulator. The results of the simulation performance of the reference implementation under different loads were analysed and compared with the Genetic Algorithm from the same group of algorithms. The obtained results show multiple improvements. The proposed data processing model enables centralised software management of heterogeneous Cloud infrastructure, optimises measured metrics, improves resource usage, and achieves the system's scalability.CERN-THESIS-2023-185oai:cds.cern.ch:28747782023-10-05T15:29:40Z
spellingShingle Computing and Computers
Loncar, Petra
Scalable Data Processing Model of the ALICE Experiment in the Cloud
title Scalable Data Processing Model of the ALICE Experiment in the Cloud
title_full Scalable Data Processing Model of the ALICE Experiment in the Cloud
title_fullStr Scalable Data Processing Model of the ALICE Experiment in the Cloud
title_full_unstemmed Scalable Data Processing Model of the ALICE Experiment in the Cloud
title_short Scalable Data Processing Model of the ALICE Experiment in the Cloud
title_sort scalable data processing model of the alice experiment in the cloud
topic Computing and Computers
url http://cds.cern.ch/record/2874778
work_keys_str_mv AT loncarpetra scalabledataprocessingmodelofthealiceexperimentinthecloud