Cargando…

Modelling High-Energy Physics Data Transfers

In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resourc...

Descripción completa

Detalles Bibliográficos
Autores principales: Bogado, Joaquin, Monticelli, Fernando, Diaz, Javier, Lassnig, Mario, Vukotic, Ilija
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:https://dx.doi.org/10.1109/eScience.2018.00081
http://cds.cern.ch/record/2800916
_version_ 1780972664280580096
author Bogado, Joaquin
Monticelli, Fernando
Diaz, Javier
Lassnig, Mario
Vukotic, Ilija
author_facet Bogado, Joaquin
Monticelli, Fernando
Diaz, Javier
Lassnig, Mario
Vukotic, Ilija
author_sort Bogado, Joaquin
collection CERN
description In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resources. We developed a model that can predict the number of pending transfers in a file transfer system[2] queue at a given time, and therefore, with some level of confidence, the estimated time to complete for each transfer. Using data analytics methods on historical data, we also managed to make predictions about the average rate of the transfers based only in their sizes. The models use information about the submission time stamp, i.e., the moment the transfer enters to the data management system, and the size of the transfer in bytes, to calculate the starting time stamp, i.e., the beginning of the usage of the network, and finishing time stamp. The rate of each transfer needs to be known or approximated. Also, the limits of concurrent active transfers need to be known. We got the rate approximation doing fit using ordinary least squares regression from scipy optimize package[4] to the function described in Equation (1) on 500 random transfers in the first dataset.
id cern-2800916
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-28009162022-02-03T19:42:54Zdoi:10.1109/eScience.2018.00081http://cds.cern.ch/record/2800916engBogado, JoaquinMonticelli, FernandoDiaz, JavierLassnig, MarioVukotic, IlijaModelling High-Energy Physics Data TransfersComputing and ComputersDetectors and Experimental TechniquesIn scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resources. We developed a model that can predict the number of pending transfers in a file transfer system[2] queue at a given time, and therefore, with some level of confidence, the estimated time to complete for each transfer. Using data analytics methods on historical data, we also managed to make predictions about the average rate of the transfers based only in their sizes. The models use information about the submission time stamp, i.e., the moment the transfer enters to the data management system, and the size of the transfer in bytes, to calculate the starting time stamp, i.e., the beginning of the usage of the network, and finishing time stamp. The rate of each transfer needs to be known or approximated. Also, the limits of concurrent active transfers need to be known. We got the rate approximation doing fit using ordinary least squares regression from scipy optimize package[4] to the function described in Equation (1) on 500 random transfers in the first dataset.oai:cds.cern.ch:28009162018
spellingShingle Computing and Computers
Detectors and Experimental Techniques
Bogado, Joaquin
Monticelli, Fernando
Diaz, Javier
Lassnig, Mario
Vukotic, Ilija
Modelling High-Energy Physics Data Transfers
title Modelling High-Energy Physics Data Transfers
title_full Modelling High-Energy Physics Data Transfers
title_fullStr Modelling High-Energy Physics Data Transfers
title_full_unstemmed Modelling High-Energy Physics Data Transfers
title_short Modelling High-Energy Physics Data Transfers
title_sort modelling high-energy physics data transfers
topic Computing and Computers
Detectors and Experimental Techniques
url https://dx.doi.org/10.1109/eScience.2018.00081
http://cds.cern.ch/record/2800916
work_keys_str_mv AT bogadojoaquin modellinghighenergyphysicsdatatransfers
AT monticellifernando modellinghighenergyphysicsdatatransfers
AT diazjavier modellinghighenergyphysicsdatatransfers
AT lassnigmario modellinghighenergyphysicsdatatransfers
AT vukoticilija modellinghighenergyphysicsdatatransfers