Cargando…

ATLAS Data Carousel

The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workfl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Xin, Klimentov, Alexei, Barisits, Martin-Stefan, Borodin, Mikhail, Di Girolamo, Alessandro, Elmsheuser, Johannes, Golubkov, Dmitry, Lassnig, Mario, Walker, Rodney, Maeno, Tadashi
Lenguaje:	eng
Publicado:	2019
Materias:	Particle Physics - Experiment
Acceso en línea:	http://cds.cern.ch/record/2696555

_version_	1780964180782743552
author	Zhao, Xin Klimentov, Alexei Barisits, Martin-Stefan Borodin, Mikhail Di Girolamo, Alessandro Elmsheuser, Johannes Golubkov, Dmitry Lassnig, Mario Walker, Rodney Maeno, Tadashi
author_facet	Zhao, Xin Klimentov, Alexei Barisits, Martin-Stefan Borodin, Mikhail Di Girolamo, Alessandro Elmsheuser, Johannes Golubkov, Dmitry Lassnig, Mario Walker, Rodney Maeno, Tadashi
author_sort	Zhao, Xin
collection	CERN
description	The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimated requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" project to increase the usage of less expensive storage , i.e., tape or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be presented at this conference and used before LHC Run 3.
id	cern-2696555
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2019
record_format	invenio
spelling	cern-26965552019-10-28T20:29:37Zhttp://cds.cern.ch/record/2696555engZhao, XinKlimentov, AlexeiBarisits, Martin-StefanBorodin, MikhailDi Girolamo, AlessandroElmsheuser, JohannesGolubkov, DmitryLassnig, MarioWalker, RodneyMaeno, TadashiATLAS Data CarouselParticle Physics - ExperimentThe ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimated requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" project to increase the usage of less expensive storage , i.e., tape or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be presented at this conference and used before LHC Run 3.ATL-SOFT-SLIDE-2019-813oai:cds.cern.ch:26965552019-10-28
spellingShingle	Particle Physics - Experiment Zhao, Xin Klimentov, Alexei Barisits, Martin-Stefan Borodin, Mikhail Di Girolamo, Alessandro Elmsheuser, Johannes Golubkov, Dmitry Lassnig, Mario Walker, Rodney Maeno, Tadashi ATLAS Data Carousel
title	ATLAS Data Carousel
title_full	ATLAS Data Carousel
title_fullStr	ATLAS Data Carousel
title_full_unstemmed	ATLAS Data Carousel
title_short	ATLAS Data Carousel
title_sort	atlas data carousel
topic	Particle Physics - Experiment
url	http://cds.cern.ch/record/2696555
work_keys_str_mv	AT zhaoxin atlasdatacarousel AT klimentovalexei atlasdatacarousel AT barisitsmartinstefan atlasdatacarousel AT borodinmikhail atlasdatacarousel AT digirolamoalessandro atlasdatacarousel AT elmsheuserjohannes atlasdatacarousel AT golubkovdmitry atlasdatacarousel AT lassnigmario atlasdatacarousel AT walkerrodney atlasdatacarousel AT maenotadashi atlasdatacarousel

ATLAS Data Carousel

Ejemplares similares