Cargando…

A study of dynamic data placement for ATLAS distributed data management

This contribution presents a study on the applicability and usefulness of dynamic data placement methods for data-intensive systems, such as ATLAS distributed data management (DDM). In this system the jobs are sent to the data, therefore having a good distribution of data is significant. Ways of for...

Descripción completa

Detalles Bibliográficos
Autores principales: Beermann, Thomas Alfons, Stewart, Graeme, Maettig, Peter
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/664/3/032002
http://cds.cern.ch/record/2016442
_version_ 1780946699147018240
author Beermann, Thomas Alfons
Stewart, Graeme
Maettig, Peter
author_facet Beermann, Thomas Alfons
Stewart, Graeme
Maettig, Peter
author_sort Beermann, Thomas Alfons
collection CERN
description This contribution presents a study on the applicability and usefulness of dynamic data placement methods for data-intensive systems, such as ATLAS distributed data management (DDM). In this system the jobs are sent to the data, therefore having a good distribution of data is significant. Ways of forecasting workload patterns are examined which then are used to redistribute data to achieve a better overall utilisation of computing resources and to reduce waiting time for jobs before they can run on the grid. This method is based on a tracer infrastructure that is able to monitor and store historical data accesses and which is used to create popularity reports. These reports provide detailed summaries about data accesses in the past, including information about the accessed files, the involved users and the sites. From this past data it is possible to then make near-term forecasts for data popularity in the future. This study evaluates simple prediction methods as well as more complex methods like neural networks. Based on the outcome of the predictions a redistribution algorithm deletes unused replicas and adds new replicas for potentially popular datasets. Finally, a grid simulator is used to examine the effects of the redistribution. The simulator replays workload on different data distributions while measuring the job waiting time and site usage. The study examines how the average waiting time is affected by the amount of data that is moved, how it differs for the various forecasting methods and how that compares to the optimal data distribution.
id cern-2016442
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling cern-20164422022-08-10T12:54:57Zdoi:10.1088/1742-6596/664/3/032002http://cds.cern.ch/record/2016442engBeermann, Thomas AlfonsStewart, GraemeMaettig, PeterA study of dynamic data placement for ATLAS distributed data managementParticle Physics - ExperimentThis contribution presents a study on the applicability and usefulness of dynamic data placement methods for data-intensive systems, such as ATLAS distributed data management (DDM). In this system the jobs are sent to the data, therefore having a good distribution of data is significant. Ways of forecasting workload patterns are examined which then are used to redistribute data to achieve a better overall utilisation of computing resources and to reduce waiting time for jobs before they can run on the grid. This method is based on a tracer infrastructure that is able to monitor and store historical data accesses and which is used to create popularity reports. These reports provide detailed summaries about data accesses in the past, including information about the accessed files, the involved users and the sites. From this past data it is possible to then make near-term forecasts for data popularity in the future. This study evaluates simple prediction methods as well as more complex methods like neural networks. Based on the outcome of the predictions a redistribution algorithm deletes unused replicas and adds new replicas for potentially popular datasets. Finally, a grid simulator is used to examine the effects of the redistribution. The simulator replays workload on different data distributions while measuring the job waiting time and site usage. The study examines how the average waiting time is affected by the amount of data that is moved, how it differs for the various forecasting methods and how that compares to the optimal data distribution.ATL-SOFT-PROC-2015-035oai:cds.cern.ch:20164422015-05-15
spellingShingle Particle Physics - Experiment
Beermann, Thomas Alfons
Stewart, Graeme
Maettig, Peter
A study of dynamic data placement for ATLAS distributed data management
title A study of dynamic data placement for ATLAS distributed data management
title_full A study of dynamic data placement for ATLAS distributed data management
title_fullStr A study of dynamic data placement for ATLAS distributed data management
title_full_unstemmed A study of dynamic data placement for ATLAS distributed data management
title_short A study of dynamic data placement for ATLAS distributed data management
title_sort study of dynamic data placement for atlas distributed data management
topic Particle Physics - Experiment
url https://dx.doi.org/10.1088/1742-6596/664/3/032002
http://cds.cern.ch/record/2016442
work_keys_str_mv AT beermannthomasalfons astudyofdynamicdataplacementforatlasdistributeddatamanagement
AT stewartgraeme astudyofdynamicdataplacementforatlasdistributeddatamanagement
AT maettigpeter astudyofdynamicdataplacementforatlasdistributeddatamanagement
AT beermannthomasalfons studyofdynamicdataplacementforatlasdistributeddatamanagement
AT stewartgraeme studyofdynamicdataplacementforatlasdistributeddatamanagement
AT maettigpeter studyofdynamicdataplacementforatlasdistributeddatamanagement