Cargando…

Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS

The past years have shown a revolution in the way scientific workloads are being executed thanks to the wide adoption of software containers. These containers run largely isolated from the host system, ensuring that the development and execution environments are the same everywhere. This enables ful...

Descripción completa

Detalles Bibliográficos
Autores principales: Mosciatti, Simone, Lange, Clemens, Blomer, Jakob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8144464/
https://www.ncbi.nlm.nih.gov/pubmed/34046587
http://dx.doi.org/10.3389/fdata.2021.673163
_version_ 1783696963260121088
author Mosciatti, Simone
Lange, Clemens
Blomer, Jakob
author_facet Mosciatti, Simone
Lange, Clemens
Blomer, Jakob
author_sort Mosciatti, Simone
collection PubMed
description The past years have shown a revolution in the way scientific workloads are being executed thanks to the wide adoption of software containers. These containers run largely isolated from the host system, ensuring that the development and execution environments are the same everywhere. This enables full reproducibility of the workloads and therefore also the associated scientific analyses performed. However, as the research software used becomes increasingly complex, the software images grow easily to sizes of multiple gigabytes. Downloading the full image onto every single compute node on which the containers are executed becomes unpractical. In this paper, we describe a novel way of distributing software images on the Kubernetes platform, with which the container can start before the entire image contents become available locally (so-called “lazy pulling”). Each file required for the execution is fetched individually and subsequently cached on-demand using the CernVM file system (CVMFS), enabling the execution of very large software images on potentially thousands of Kubernetes nodes with very little overhead. We present several performance benchmarks making use of typical high-energy physics analysis workloads.
format Online
Article
Text
id pubmed-8144464
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81444642021-05-26 Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS Mosciatti, Simone Lange, Clemens Blomer, Jakob Front Big Data Big Data The past years have shown a revolution in the way scientific workloads are being executed thanks to the wide adoption of software containers. These containers run largely isolated from the host system, ensuring that the development and execution environments are the same everywhere. This enables full reproducibility of the workloads and therefore also the associated scientific analyses performed. However, as the research software used becomes increasingly complex, the software images grow easily to sizes of multiple gigabytes. Downloading the full image onto every single compute node on which the containers are executed becomes unpractical. In this paper, we describe a novel way of distributing software images on the Kubernetes platform, with which the container can start before the entire image contents become available locally (so-called “lazy pulling”). Each file required for the execution is fetched individually and subsequently cached on-demand using the CernVM file system (CVMFS), enabling the execution of very large software images on potentially thousands of Kubernetes nodes with very little overhead. We present several performance benchmarks making use of typical high-energy physics analysis workloads. Frontiers Media S.A. 2021-05-11 /pmc/articles/PMC8144464/ /pubmed/34046587 http://dx.doi.org/10.3389/fdata.2021.673163 Text en Copyright © 2021 Mosciatti, Lange and Blomer. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Mosciatti, Simone
Lange, Clemens
Blomer, Jakob
Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title_full Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title_fullStr Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title_full_unstemmed Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title_short Increasing the Execution Speed of Containerized Analysis Workflows Using an Image Snapshotter in Combination With CVMFS
title_sort increasing the execution speed of containerized analysis workflows using an image snapshotter in combination with cvmfs
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8144464/
https://www.ncbi.nlm.nih.gov/pubmed/34046587
http://dx.doi.org/10.3389/fdata.2021.673163
work_keys_str_mv AT mosciattisimone increasingtheexecutionspeedofcontainerizedanalysisworkflowsusinganimagesnapshotterincombinationwithcvmfs
AT langeclemens increasingtheexecutionspeedofcontainerizedanalysisworkflowsusinganimagesnapshotterincombinationwithcvmfs
AT blomerjakob increasingtheexecutionspeedofcontainerizedanalysisworkflowsusinganimagesnapshotterincombinationwithcvmfs