Cargando…
Modelling High-Energy Physics Data Transfers
In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resourc...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1109/eScience.2018.00081 http://cds.cern.ch/record/2800916 |
_version_ | 1780972664280580096 |
---|---|
author | Bogado, Joaquin Monticelli, Fernando Diaz, Javier Lassnig, Mario Vukotic, Ilija |
author_facet | Bogado, Joaquin Monticelli, Fernando Diaz, Javier Lassnig, Mario Vukotic, Ilija |
author_sort | Bogado, Joaquin |
collection | CERN |
description | In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resources. We developed a model that can predict the number of pending transfers in a file transfer system[2] queue at a given time, and therefore, with some level of confidence, the estimated time to complete for each transfer. Using data analytics methods on historical data, we also managed to make predictions about the average rate of the transfers based only in their sizes. The models use information about the submission time stamp, i.e., the moment the transfer enters to the data management system, and the size of the transfer in bytes, to calculate the starting time stamp, i.e., the beginning of the usage of the network, and finishing time stamp. The rate of each transfer needs to be known or approximated. Also, the limits of concurrent active transfers need to be known. We got the rate approximation doing fit using ordinary least squares regression from scipy optimize package[4] to the function described in Equation (1) on 500 random transfers in the first dataset. |
id | cern-2800916 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2018 |
record_format | invenio |
spelling | cern-28009162022-02-03T19:42:54Zdoi:10.1109/eScience.2018.00081http://cds.cern.ch/record/2800916engBogado, JoaquinMonticelli, FernandoDiaz, JavierLassnig, MarioVukotic, IlijaModelling High-Energy Physics Data TransfersComputing and ComputersDetectors and Experimental TechniquesIn scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resources. We developed a model that can predict the number of pending transfers in a file transfer system[2] queue at a given time, and therefore, with some level of confidence, the estimated time to complete for each transfer. Using data analytics methods on historical data, we also managed to make predictions about the average rate of the transfers based only in their sizes. The models use information about the submission time stamp, i.e., the moment the transfer enters to the data management system, and the size of the transfer in bytes, to calculate the starting time stamp, i.e., the beginning of the usage of the network, and finishing time stamp. The rate of each transfer needs to be known or approximated. Also, the limits of concurrent active transfers need to be known. We got the rate approximation doing fit using ordinary least squares regression from scipy optimize package[4] to the function described in Equation (1) on 500 random transfers in the first dataset.oai:cds.cern.ch:28009162018 |
spellingShingle | Computing and Computers Detectors and Experimental Techniques Bogado, Joaquin Monticelli, Fernando Diaz, Javier Lassnig, Mario Vukotic, Ilija Modelling High-Energy Physics Data Transfers |
title | Modelling High-Energy Physics Data Transfers |
title_full | Modelling High-Energy Physics Data Transfers |
title_fullStr | Modelling High-Energy Physics Data Transfers |
title_full_unstemmed | Modelling High-Energy Physics Data Transfers |
title_short | Modelling High-Energy Physics Data Transfers |
title_sort | modelling high-energy physics data transfers |
topic | Computing and Computers Detectors and Experimental Techniques |
url | https://dx.doi.org/10.1109/eScience.2018.00081 http://cds.cern.ch/record/2800916 |
work_keys_str_mv | AT bogadojoaquin modellinghighenergyphysicsdatatransfers AT monticellifernando modellinghighenergyphysicsdatatransfers AT diazjavier modellinghighenergyphysicsdatatransfers AT lassnigmario modellinghighenergyphysicsdatatransfers AT vukoticilija modellinghighenergyphysicsdatatransfers |