Cargando…
Modelling High-Energy Physics Data Transfers
In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resourc...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1109/eScience.2018.00081 http://cds.cern.ch/record/2800916 |
Sumario: | In scientific data management systems like Rucio[1], the possibility to know when a file transfer is going to be finished at the moment of submission opens a wide range of opportunities to improve the schedule techniques actually being used, and therefore to optimize the use of the available resources. We developed a model that can predict the number of pending transfers in a file transfer system[2] queue at a given time, and therefore, with some level of confidence, the estimated time to complete for each transfer. Using data analytics methods on historical data, we also managed to make predictions about the average rate of the transfers based only in their sizes. The models use information about the submission time stamp, i.e., the moment the transfer enters to the data management system, and the size of the transfer in bytes, to calculate the starting time stamp, i.e., the beginning of the usage of the network, and finishing time stamp. The rate of each transfer needs to be known or approximated. Also, the limits of concurrent active transfers need to be known. We got the rate approximation doing fit using ordinary least squares regression from scipy optimize package[4] to the function described in Equation (1) on 500 random transfers in the first dataset. |
---|