Cargando…
A popularity prediction and dynamic data replication study for the ATLAS distributed data management
ATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2276011 |
_version_ | 1780955185984569344 |
---|---|
author | Beermann, Thomas |
author_facet | Beermann, Thomas |
author_sort | Beermann, Thomas |
collection | CERN |
description | ATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to observe physical phenomena and massive particles that were not observable before. The detectors at the LHC itself create vast amount of data that need to be accessible to physicists for their analysis. For this reason a worldwide computing grid (WLCG) was created that connects hundreds of computing centres across the planet. The experiments constantly create new data but older data has to be kept as well. The available resources are limited, which requires a smart management of the storage space. This thesis presents a method to dynamically create new replicas and delete unused replicas based on a prediction of data popularity to improve user waiting times. The first part gives an general introduction of the LHC, ATLAS and the WLCG, a description of the computing model and systems used by the ATLAS experiment and finally the motivation for this work. The second part concentrates on the popularity prediction, introducing how the access data from the grid can be transformed to be used with different prediction methods. The evaluation describes typical usage patterns followed by a discussion of the advantages and disadvantages of the prediction algorithms, which then leads to the hybrid prediction, where two methods are combined to improve the results. The third part then first introduces the redistribution algorithms that then uses the popularity prediction to delete and add new replicas. After that a grid simulator is described that was developed to study the impact of the redistribution on different workloads. Finally, the evaluation shows the impact of the redistribution on waiting times for user analysis jobs on the grid. The last part summarises the results and gives an outlook for further developments. |
id | cern-2276011 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2017 |
record_format | invenio |
spelling | cern-22760112019-09-30T06:29:59Zhttp://cds.cern.ch/record/2276011engBeermann, ThomasA popularity prediction and dynamic data replication study for the ATLAS distributed data managementComputing and ComputersATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to observe physical phenomena and massive particles that were not observable before. The detectors at the LHC itself create vast amount of data that need to be accessible to physicists for their analysis. For this reason a worldwide computing grid (WLCG) was created that connects hundreds of computing centres across the planet. The experiments constantly create new data but older data has to be kept as well. The available resources are limited, which requires a smart management of the storage space. This thesis presents a method to dynamically create new replicas and delete unused replicas based on a prediction of data popularity to improve user waiting times. The first part gives an general introduction of the LHC, ATLAS and the WLCG, a description of the computing model and systems used by the ATLAS experiment and finally the motivation for this work. The second part concentrates on the popularity prediction, introducing how the access data from the grid can be transformed to be used with different prediction methods. The evaluation describes typical usage patterns followed by a discussion of the advantages and disadvantages of the prediction algorithms, which then leads to the hybrid prediction, where two methods are combined to improve the results. The third part then first introduces the redistribution algorithms that then uses the popularity prediction to delete and add new replicas. After that a grid simulator is described that was developed to study the impact of the redistribution on different workloads. Finally, the evaluation shows the impact of the redistribution on waiting times for user analysis jobs on the grid. The last part summarises the results and gives an outlook for further developments.CERN-THESIS-2017-096oai:cds.cern.ch:22760112017-07-27T14:50:35Z |
spellingShingle | Computing and Computers Beermann, Thomas A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title | A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title_full | A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title_fullStr | A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title_full_unstemmed | A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title_short | A popularity prediction and dynamic data replication study for the ATLAS distributed data management |
title_sort | popularity prediction and dynamic data replication study for the atlas distributed data management |
topic | Computing and Computers |
url | http://cds.cern.ch/record/2276011 |
work_keys_str_mv | AT beermannthomas apopularitypredictionanddynamicdatareplicationstudyfortheatlasdistributeddatamanagement AT beermannthomas popularitypredictionanddynamicdatareplicationstudyfortheatlasdistributeddatamanagement |