Cargando…

A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management

This paper presents a system to predict future data popularity for data-intensive systems, such as ATLAS distributed data management (DDM). Using these predictions it is possible to make a better distribution of data, helping to reduce the waiting time for jobs using with this data. This system is b...

Descripción completa

Detalles Bibliográficos
Autores principales: Beermann, T, Stewart, G A, Maettig, P
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:http://cds.cern.ch/record/1690335
_version_ 1780935599384952832
author Beermann, T
Stewart, G A
Maettig, P
author_facet Beermann, T
Stewart, G A
Maettig, P
author_sort Beermann, T
collection CERN
description This paper presents a system to predict future data popularity for data-intensive systems, such as ATLAS distributed data management (DDM). Using these predictions it is possible to make a better distribution of data, helping to reduce the waiting time for jobs using with this data. This system is based on a tracer infrastructure that is able to monitor and store historical data accesses and which is used to create popularity reports. These reports provide detailed summaries about data accesses in the past, including information about the accessed files, the involved users and the sites. From this past data it is possible to then make near-term forecasts for data popularity in the future. The prediction system introduced in this paper makes use of both simple prediction methods as well as predictions made by neural networks. The best prediction method is dependent on the type of data and the data is carefully filtered for use in either system. The second part of the paper introduces a system that effectively places data based on the predictions. This is a two phase process: In the first phase space is freed by removing unpopular replicas; in the second new replicas for popular datasets are created. The process of creating new replicas is limited by certain constraints: there is only a limited amount of space available and the creation of replicas involve transfers that use bandwidth. Furthermore, the benefits of each replica is different. The goal is to maximise the global benefit while respecting the constraints. The final part shows the evaluation of this method using a grid simulator. The simulator is able to replay workload on different data distributions while measuring the job waiting time. We show how job waiting time can be reduced based on accurate predictions about future accesses.
id cern-1690335
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling cern-16903352022-08-10T20:42:51Zhttp://cds.cern.ch/record/1690335engBeermann, TStewart, G AMaettig, PA Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data ManagementDetectors and Experimental TechniquesThis paper presents a system to predict future data popularity for data-intensive systems, such as ATLAS distributed data management (DDM). Using these predictions it is possible to make a better distribution of data, helping to reduce the waiting time for jobs using with this data. This system is based on a tracer infrastructure that is able to monitor and store historical data accesses and which is used to create popularity reports. These reports provide detailed summaries about data accesses in the past, including information about the accessed files, the involved users and the sites. From this past data it is possible to then make near-term forecasts for data popularity in the future. The prediction system introduced in this paper makes use of both simple prediction methods as well as predictions made by neural networks. The best prediction method is dependent on the type of data and the data is carefully filtered for use in either system. The second part of the paper introduces a system that effectively places data based on the predictions. This is a two phase process: In the first phase space is freed by removing unpopular replicas; in the second new replicas for popular datasets are created. The process of creating new replicas is limited by certain constraints: there is only a limited amount of space available and the creation of replicas involve transfers that use bandwidth. Furthermore, the benefits of each replica is different. The goal is to maximise the global benefit while respecting the constraints. The final part shows the evaluation of this method using a grid simulator. The simulator is able to replay workload on different data distributions while measuring the job waiting time. We show how job waiting time can be reduced based on accurate predictions about future accesses.ATL-SOFT-PROC-2014-001oai:cds.cern.ch:16903352014-03-27
spellingShingle Detectors and Experimental Techniques
Beermann, T
Stewart, G A
Maettig, P
A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title_full A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title_fullStr A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title_full_unstemmed A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title_short A Popularity Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management
title_sort popularity based prediction and data redistribution tool for atlas distributed data management
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1690335
work_keys_str_mv AT beermannt apopularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement
AT stewartga apopularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement
AT maettigp apopularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement
AT beermannt popularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement
AT stewartga popularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement
AT maettigp popularitybasedpredictionanddataredistributiontoolforatlasdistributeddatamanagement