Cargando…

Caching for dataset-based workloads with heterogeneous file sizes

Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that sto...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chuchuk, Olga, Neglia, Giovanni, Schulz, Markus, Duellmann, Dirk
Lenguaje:	eng
Publicado:	2022
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.22323/1.415.0009 http://cds.cern.ch/record/2861084

_version_	1780977796538957824
author	Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk
author_facet	Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk
author_sort	Chuchuk, Olga
collection	CERN
description	Caching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe. Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones. In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.
id	cern-2861084
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2022
record_format	invenio
spelling	cern-28610842023-06-07T18:56:34Zdoi:10.22323/1.415.0009http://cds.cern.ch/record/2861084engChuchuk, OlgaNeglia, GiovanniSchulz, MarkusDuellmann, DirkCaching for dataset-based workloads with heterogeneous file sizesComputing and ComputersCaching can effectively reduce the cost of serving content and improve the user experience. In this paper, we explore the benefits of caching for existing scientific workloads, taking the Worldwide LHC (Large Hadron Collider) Computing Grid as an example. It is a globally distributed system that stores and processes multiple hundred petabytes of data and serves the needs of thousands of scientists around the globe. Scientific computation differs from other applications like video streaming as file sizes vary from a few bytes to terabytes and logical links between the files affect user access patterns. These factors profoundly influence caches' performance and, therefore, should be carefully analyzed to select which caching policy to deploy or to design new ones. In this work, we study how the hierarchical organization of the LHC physics data into files and groups of files called datasets affects the request patterns. We then propose new caching policies that exploit dataset-specific knowledge and compare them with file-based ones. Moreover, we show that limited connectivity between the computing and storage sites leads to the delayed hits phenomenon and estimate the consequent reduction in the potential benefits of caching.oai:cds.cern.ch:28610842022
spellingShingle	Computing and Computers Chuchuk, Olga Neglia, Giovanni Schulz, Markus Duellmann, Dirk Caching for dataset-based workloads with heterogeneous file sizes
title	Caching for dataset-based workloads with heterogeneous file sizes
title_full	Caching for dataset-based workloads with heterogeneous file sizes
title_fullStr	Caching for dataset-based workloads with heterogeneous file sizes
title_full_unstemmed	Caching for dataset-based workloads with heterogeneous file sizes
title_short	Caching for dataset-based workloads with heterogeneous file sizes
title_sort	caching for dataset-based workloads with heterogeneous file sizes
topic	Computing and Computers
url	https://dx.doi.org/10.22323/1.415.0009 http://cds.cern.ch/record/2861084
work_keys_str_mv	AT chuchukolga cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT negliagiovanni cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT schulzmarkus cachingfordatasetbasedworkloadswithheterogeneousfilesizes AT duellmanndirk cachingfordatasetbasedworkloadswithheterogeneousfilesizes

Caching for dataset-based workloads with heterogeneous file sizes

Ejemplares similares