Cargando…
Caching for dataset-based workloads with heterogeneous file sizes
Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that sto...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.22323/1.415.0009 http://cds.cern.ch/record/2861084 |
_version_ | 1780977796538957824 |
---|---|
author | Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk |
author_facet | Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk |
author_sort | Chuchuk, Olga |
collection | CERN |
description | Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe.
Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones.
In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching. |
id | cern-2861084 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2022 |
record_format | invenio |
spelling | cern-28610842023-06-07T18:56:34Zdoi:10.22323/1.415.0009http://cds.cern.ch/record/2861084engChuchuk, OlgaNeglia, GiovanniSchulz, MarkusDuellmann, DirkCaching for dataset-based workloads with heterogeneous file sizesComputing and ComputersCaching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe. Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones. In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.oai:cds.cern.ch:28610842022 |
spellingShingle | Computing and Computers Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk Caching for dataset-based workloads with heterogeneous file sizes |
title | Caching for dataset-based workloads with heterogeneous file sizes |
title_full | Caching for dataset-based workloads with heterogeneous file sizes |
title_fullStr | Caching for dataset-based workloads with heterogeneous file sizes |
title_full_unstemmed | Caching for dataset-based workloads with heterogeneous file sizes |
title_short | Caching for dataset-based workloads with heterogeneous file sizes |
title_sort | caching for dataset-based workloads with heterogeneous file sizes |
topic | Computing and Computers |
url | https://dx.doi.org/10.22323/1.415.0009 http://cds.cern.ch/record/2861084 |
work_keys_str_mv | AT chuchukolga cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT negliagiovanni cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT schulzmarkus cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT duellmanndirk cachingfordatasetbasedworkloadswithheterogeneousfilesizes |