Cargando…

The Need for a Versioned Data Analysis Software Environment

Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scien...

Descripción completa

Detalles Bibliográficos
Autores principales: Blomer, Jakob, Berzano, Dario, Buncic, Predrag, Charalampidis, Ioannis, Ganis, Gerardo, Lestaris, George, Meusel, René
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:http://cds.cern.ch/record/2002568
_version_ 1780946083887710208
author Blomer, Jakob
Berzano, Dario
Buncic, Predrag
Charalampidis, Ioannis
Ganis, Gerardo
Lestaris, George
Meusel, René
author_facet Blomer, Jakob
Berzano, Dario
Buncic, Predrag
Charalampidis, Ioannis
Ganis, Gerardo
Lestaris, George
Meusel, René
author_sort Blomer, Jakob
collection CERN
description Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, it turns out that despite source code availability the setup and the validation of a minimal usable analysis environment can easily become prohibitively expensive. We argue that there is a substantial gap between merely having access to versioned source code and the ability to create a data analysis runtime environment. In order to preserve all the different variants of the data analysis runtime environment, we developed a snapshotting file system optimized for software distribution. We report on our experience in preserving the analysis environment for high-energy physics such as the software landscape used to discover the Higgs boson at the Large Hadron Collider.
id oai-inspirehep.net-1306055
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling oai-inspirehep.net-13060552023-03-14T17:56:08Zhttp://cds.cern.ch/record/2002568engBlomer, JakobBerzano, DarioBuncic, PredragCharalampidis, IoannisGanis, GerardoLestaris, GeorgeMeusel, RenéThe Need for a Versioned Data Analysis Software Environmentcs.SEComputing and ComputersScientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, it turns out that despite source code availability the setup and the validation of a minimal usable analysis environment can easily become prohibitively expensive. We argue that there is a substantial gap between merely having access to versioned source code and the ability to create a data analysis runtime environment. In order to preserve all the different variants of the data analysis runtime environment, we developed a snapshotting file system optimized for software distribution. We report on our experience in preserving the analysis environment for high-energy physics such as the software landscape used to discover the Higgs boson at the Large Hadron Collider.arXiv:1407.3063oai:inspirehep.net:13060552014
spellingShingle cs.SE
Computing and Computers
Blomer, Jakob
Berzano, Dario
Buncic, Predrag
Charalampidis, Ioannis
Ganis, Gerardo
Lestaris, George
Meusel, René
The Need for a Versioned Data Analysis Software Environment
title The Need for a Versioned Data Analysis Software Environment
title_full The Need for a Versioned Data Analysis Software Environment
title_fullStr The Need for a Versioned Data Analysis Software Environment
title_full_unstemmed The Need for a Versioned Data Analysis Software Environment
title_short The Need for a Versioned Data Analysis Software Environment
title_sort need for a versioned data analysis software environment
topic cs.SE
Computing and Computers
url http://cds.cern.ch/record/2002568
work_keys_str_mv AT blomerjakob theneedforaversioneddataanalysissoftwareenvironment
AT berzanodario theneedforaversioneddataanalysissoftwareenvironment
AT buncicpredrag theneedforaversioneddataanalysissoftwareenvironment
AT charalampidisioannis theneedforaversioneddataanalysissoftwareenvironment
AT ganisgerardo theneedforaversioneddataanalysissoftwareenvironment
AT lestarisgeorge theneedforaversioneddataanalysissoftwareenvironment
AT meuselrene theneedforaversioneddataanalysissoftwareenvironment
AT blomerjakob needforaversioneddataanalysissoftwareenvironment
AT berzanodario needforaversioneddataanalysissoftwareenvironment
AT buncicpredrag needforaversioneddataanalysissoftwareenvironment
AT charalampidisioannis needforaversioneddataanalysissoftwareenvironment
AT ganisgerardo needforaversioneddataanalysissoftwareenvironment
AT lestarisgeorge needforaversioneddataanalysissoftwareenvironment
AT meuselrene needforaversioneddataanalysissoftwareenvironment