Cargando…

Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio

This contribution details the deployment of Rucio, the ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety of external services, and connects globally distributed data centres under different technological and administrative control, at an unpr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lassnig, Mario, Vigne, Ralph, Barisits, Martin-Stefan, Beermann, Thomas Alfons, Serfon, Cedric, Garonne, Vincent
Lenguaje:	eng
Publicado:	2015
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2004831

_version_	1780946129134813184
author	Lassnig, Mario Vigne, Ralph Barisits, Martin-Stefan Beermann, Thomas Alfons Serfon, Cedric Garonne, Vincent
author_facet	Lassnig, Mario Vigne, Ralph Barisits, Martin-Stefan Beermann, Thomas Alfons Serfon, Cedric Garonne, Vincent
author_sort	Lassnig, Mario
collection	CERN
description	This contribution details the deployment of Rucio, the ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety of external services, and connects globally distributed data centres under different technological and administrative control, at an unprecedented data volume. It is therefore not possibly to create a duplicate instance of Rucio for testing or integration. Every software upgrade or configuration change is thus potentially disruptive and requires fail-safe software and automatic error recovery. Rucio uses a three-layer scaling and mitigation strategy based on quasi-realtime monitoring. This strategy mainly employs independent stateless services, automatic failover, and service migration. The technologies used for deployment and mitigation include OpenStack, Puppet, Graphite, HAProxy, Apache, and nginx. In this contribution, the reasons and design decisions for the deployment, the actual implementation, and an evaluation of all involved services and components are discussed.
id	cern-2004831
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2015
record_format	invenio
spelling	cern-20048312019-09-30T06:29:59Zhttp://cds.cern.ch/record/2004831engLassnig, MarioVigne, RalphBarisits, Martin-StefanBeermann, Thomas AlfonsSerfon, CedricGaronne, VincentScalable and fail-safe deployment of the ATLAS Distributed Data Management system RucioParticle Physics - ExperimentThis contribution details the deployment of Rucio, the ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety of external services, and connects globally distributed data centres under different technological and administrative control, at an unprecedented data volume. It is therefore not possibly to create a duplicate instance of Rucio for testing or integration. Every software upgrade or configuration change is thus potentially disruptive and requires fail-safe software and automatic error recovery. Rucio uses a three-layer scaling and mitigation strategy based on quasi-realtime monitoring. This strategy mainly employs independent stateless services, automatic failover, and service migration. The technologies used for deployment and mitigation include OpenStack, Puppet, Graphite, HAProxy, Apache, and nginx. In this contribution, the reasons and design decisions for the deployment, the actual implementation, and an evaluation of all involved services and components are discussed.ATL-SOFT-SLIDE-2015-116oai:cds.cern.ch:20048312015-03-27
spellingShingle	Particle Physics - Experiment Lassnig, Mario Vigne, Ralph Barisits, Martin-Stefan Beermann, Thomas Alfons Serfon, Cedric Garonne, Vincent Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title	Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title_full	Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title_fullStr	Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title_full_unstemmed	Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title_short	Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio
title_sort	scalable and fail-safe deployment of the atlas distributed data management system rucio
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2004831
work_keys_str_mv	AT lassnigmario scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio AT vigneralph scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio AT barisitsmartinstefan scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio AT beermannthomasalfons scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio AT serfoncedric scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio AT garonnevincent scalableandfailsafedeploymentoftheatlasdistributeddatamanagementsystemrucio

Scalable and fail-safe deployment of the ATLAS Distributed Data Management system Rucio

Ejemplares similares