Cargando…
The Need for a Versioned Data Analysis Software Environment
Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scien...
Autores principales: | , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2014
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2002568 |
_version_ | 1780946083887710208 |
---|---|
author | Blomer, Jakob Berzano, Dario Buncic, Predrag Charalampidis, Ioannis Ganis, Gerardo Lestaris, George Meusel, René |
author_facet | Blomer, Jakob Berzano, Dario Buncic, Predrag Charalampidis, Ioannis Ganis, Gerardo Lestaris, George Meusel, René |
author_sort | Blomer, Jakob |
collection | CERN |
description | Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, it turns out that despite source code availability the setup and the validation of a minimal usable analysis environment can easily become prohibitively expensive. We argue that there is a substantial gap between merely having access to versioned source code and the ability to create a data analysis runtime environment. In order to preserve all the different variants of the data analysis runtime environment, we developed a snapshotting file system optimized for software distribution. We report on our experience in preserving the analysis environment for high-energy physics such as the software landscape used to discover the Higgs boson at the Large Hadron Collider. |
id | oai-inspirehep.net-1306055 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2014 |
record_format | invenio |
spelling | oai-inspirehep.net-13060552023-03-14T17:56:08Zhttp://cds.cern.ch/record/2002568engBlomer, JakobBerzano, DarioBuncic, PredragCharalampidis, IoannisGanis, GerardoLestaris, GeorgeMeusel, RenéThe Need for a Versioned Data Analysis Software Environmentcs.SEComputing and ComputersScientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, it turns out that despite source code availability the setup and the validation of a minimal usable analysis environment can easily become prohibitively expensive. We argue that there is a substantial gap between merely having access to versioned source code and the ability to create a data analysis runtime environment. In order to preserve all the different variants of the data analysis runtime environment, we developed a snapshotting file system optimized for software distribution. We report on our experience in preserving the analysis environment for high-energy physics such as the software landscape used to discover the Higgs boson at the Large Hadron Collider.arXiv:1407.3063oai:inspirehep.net:13060552014 |
spellingShingle | cs.SE Computing and Computers Blomer, Jakob Berzano, Dario Buncic, Predrag Charalampidis, Ioannis Ganis, Gerardo Lestaris, George Meusel, René The Need for a Versioned Data Analysis Software Environment |
title | The Need for a Versioned Data Analysis Software Environment |
title_full | The Need for a Versioned Data Analysis Software Environment |
title_fullStr | The Need for a Versioned Data Analysis Software Environment |
title_full_unstemmed | The Need for a Versioned Data Analysis Software Environment |
title_short | The Need for a Versioned Data Analysis Software Environment |
title_sort | need for a versioned data analysis software environment |
topic | cs.SE Computing and Computers |
url | http://cds.cern.ch/record/2002568 |
work_keys_str_mv | AT blomerjakob theneedforaversioneddataanalysissoftwareenvironment AT berzanodario theneedforaversioneddataanalysissoftwareenvironment AT buncicpredrag theneedforaversioneddataanalysissoftwareenvironment AT charalampidisioannis theneedforaversioneddataanalysissoftwareenvironment AT ganisgerardo theneedforaversioneddataanalysissoftwareenvironment AT lestarisgeorge theneedforaversioneddataanalysissoftwareenvironment AT meuselrene theneedforaversioneddataanalysissoftwareenvironment AT blomerjakob needforaversioneddataanalysissoftwareenvironment AT berzanodario needforaversioneddataanalysissoftwareenvironment AT buncicpredrag needforaversioneddataanalysissoftwareenvironment AT charalampidisioannis needforaversioneddataanalysissoftwareenvironment AT ganisgerardo needforaversioneddataanalysissoftwareenvironment AT lestarisgeorge needforaversioneddataanalysissoftwareenvironment AT meuselrene needforaversioneddataanalysissoftwareenvironment |