Cargando…

A popularity prediction and dynamic data replication study for the ATLAS distributed data management

ATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to...

Descripción completa

Detalles Bibliográficos
Autor principal: Beermann, Thomas
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:http://cds.cern.ch/record/2276011
_version_ 1780955185984569344
author Beermann, Thomas
author_facet Beermann, Thomas
author_sort Beermann, Thomas
collection CERN
description ATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to observe physical phenomena and massive particles that were not observable before. The detectors at the LHC itself create vast amount of data that need to be accessible to physicists for their analysis. For this reason a worldwide computing grid (WLCG) was created that connects hundreds of computing centres across the planet. The experiments constantly create new data but older data has to be kept as well. The available resources are limited, which requires a smart management of the storage space. This thesis presents a method to dynamically create new replicas and delete unused replicas based on a prediction of data popularity to improve user waiting times. The first part gives an general introduction of the LHC, ATLAS and the WLCG, a description of the computing model and systems used by the ATLAS experiment and finally the motivation for this work. The second part concentrates on the popularity prediction, introducing how the access data from the grid can be transformed to be used with different prediction methods. The evaluation describes typical usage patterns followed by a discussion of the advantages and disadvantages of the prediction algorithms, which then leads to the hybrid prediction, where two methods are combined to improve the results. The third part then first introduces the redistribution algorithms that then uses the popularity prediction to delete and add new replicas. After that a grid simulator is described that was developed to study the impact of the redistribution on different workloads. Finally, the evaluation shows the impact of the redistribution on waiting times for user analysis jobs on the grid. The last part summarises the results and gives an outlook for further developments.
id cern-2276011
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling cern-22760112019-09-30T06:29:59Zhttp://cds.cern.ch/record/2276011engBeermann, ThomasA popularity prediction and dynamic data replication study for the ATLAS distributed data managementComputing and ComputersATLAS (A Toroidal LHC Apparatus) is one of several experiments of at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The LHC is the largest and most powerful particle accelerator in the world, which is able to operate at unprecedented energy levels. Because of this, ATLAS is able to observe physical phenomena and massive particles that were not observable before. The detectors at the LHC itself create vast amount of data that need to be accessible to physicists for their analysis. For this reason a worldwide computing grid (WLCG) was created that connects hundreds of computing centres across the planet. The experiments constantly create new data but older data has to be kept as well. The available resources are limited, which requires a smart management of the storage space. This thesis presents a method to dynamically create new replicas and delete unused replicas based on a prediction of data popularity to improve user waiting times. The first part gives an general introduction of the LHC, ATLAS and the WLCG, a description of the computing model and systems used by the ATLAS experiment and finally the motivation for this work. The second part concentrates on the popularity prediction, introducing how the access data from the grid can be transformed to be used with different prediction methods. The evaluation describes typical usage patterns followed by a discussion of the advantages and disadvantages of the prediction algorithms, which then leads to the hybrid prediction, where two methods are combined to improve the results. The third part then first introduces the redistribution algorithms that then uses the popularity prediction to delete and add new replicas. After that a grid simulator is described that was developed to study the impact of the redistribution on different workloads. Finally, the evaluation shows the impact of the redistribution on waiting times for user analysis jobs on the grid. The last part summarises the results and gives an outlook for further developments.CERN-THESIS-2017-096oai:cds.cern.ch:22760112017-07-27T14:50:35Z
spellingShingle Computing and Computers
Beermann, Thomas
A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title_full A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title_fullStr A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title_full_unstemmed A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title_short A popularity prediction and dynamic data replication study for the ATLAS distributed data management
title_sort popularity prediction and dynamic data replication study for the atlas distributed data management
topic Computing and Computers
url http://cds.cern.ch/record/2276011
work_keys_str_mv AT beermannthomas apopularitypredictionanddynamicdatareplicationstudyfortheatlasdistributeddatamanagement
AT beermannthomas popularitypredictionanddynamicdatareplicationstudyfortheatlasdistributeddatamanagement