Cargando…

Forecasting Network Throughput of Remote Data Access in Computing Grids

Computing grids are key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. These computing workloads are typically data-intensive. The current state of the art approach...

Descripción completa

Detalles Bibliográficos
Autores principales: Begy, Volodimir, Barisits, Martin-Stefan, Lassnig, Mario
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:https://dx.doi.org/10.1016/j.jocs.2020.101158
http://cds.cern.ch/record/2621616
_version_ 1780958521597100032
author Begy, Volodimir
Barisits, Martin-Stefan
Lassnig, Mario
author_facet Begy, Volodimir
Barisits, Martin-Stefan
Lassnig, Mario
author_sort Begy, Volodimir
collection CERN
description Computing grids are key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. These computing workloads are typically data-intensive. The current state of the art approach for data access in grids is data placement: a job is scheduled to run at a specific data center, and its execution commences only once the complete input data has been transferred there. An alternative approach is remote data access: a job may stream the input data directly from arbitrary storage elements. Remote data access brings two innovative benefits: (1) the jobs can be executed asynchronously with respect to the data transfer; (2) when combined with data placement on the policy level, it can aid in the optimization of the network load, since these two data access methodologies partially exhibit nonoverlapping bottlenecks. However, in order to employ this technique systematically, the properties of its network throughput need to be studied carefully. This paper presents experimentally identified parameters of remote data access throughput, statistically tested formalization of these parameters and a derived throughput forecasting model. The model is applicable to large computing workloads, robust with respect to arbitrary dynamic changes in the grid infrastructure and exhibits a long-term prediction horizon. Its purpose is to assist various stakeholders of the grid in decision-making related to data access patterns. This work is based on measurements taken on the Worldwide LHC Computing Grid at CERN.
id cern-2621616
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-26216162022-01-25T10:52:09Zdoi:10.1016/j.jocs.2020.101158http://cds.cern.ch/record/2621616engBegy, VolodimirBarisits, Martin-StefanLassnig, MarioForecasting Network Throughput of Remote Data Access in Computing GridsParticle Physics - ExperimentComputing and ComputersComputing grids are key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. These computing workloads are typically data-intensive. The current state of the art approach for data access in grids is data placement: a job is scheduled to run at a specific data center, and its execution commences only once the complete input data has been transferred there. An alternative approach is remote data access: a job may stream the input data directly from arbitrary storage elements. Remote data access brings two innovative benefits: (1) the jobs can be executed asynchronously with respect to the data transfer; (2) when combined with data placement on the policy level, it can aid in the optimization of the network load, since these two data access methodologies partially exhibit nonoverlapping bottlenecks. However, in order to employ this technique systematically, the properties of its network throughput need to be studied carefully. This paper presents experimentally identified parameters of remote data access throughput, statistically tested formalization of these parameters and a derived throughput forecasting model. The model is applicable to large computing workloads, robust with respect to arbitrary dynamic changes in the grid infrastructure and exhibits a long-term prediction horizon. Its purpose is to assist various stakeholders of the grid in decision-making related to data access patterns. This work is based on measurements taken on the Worldwide LHC Computing Grid at CERN.ATL-SOFT-PROC-2018-001oai:cds.cern.ch:26216162018-06-04
spellingShingle Particle Physics - Experiment
Computing and Computers
Begy, Volodimir
Barisits, Martin-Stefan
Lassnig, Mario
Forecasting Network Throughput of Remote Data Access in Computing Grids
title Forecasting Network Throughput of Remote Data Access in Computing Grids
title_full Forecasting Network Throughput of Remote Data Access in Computing Grids
title_fullStr Forecasting Network Throughput of Remote Data Access in Computing Grids
title_full_unstemmed Forecasting Network Throughput of Remote Data Access in Computing Grids
title_short Forecasting Network Throughput of Remote Data Access in Computing Grids
title_sort forecasting network throughput of remote data access in computing grids
topic Particle Physics - Experiment
Computing and Computers
url https://dx.doi.org/10.1016/j.jocs.2020.101158
http://cds.cern.ch/record/2621616
work_keys_str_mv AT begyvolodimir forecastingnetworkthroughputofremotedataaccessincomputinggrids
AT barisitsmartinstefan forecastingnetworkthroughputofremotedataaccessincomputinggrids
AT lassnigmario forecastingnetworkthroughputofremotedataaccessincomputinggrids