Cargando…

ATLAS Data Carousel

The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workfl...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xin, Klimentov, Alexei, Barisits, Martin-Stefan, Borodin, Mikhail, Di Girolamo, Alessandro, Elmsheuser, Johannes, Golubkov, Dmitry, Lassnig, Mario, Walker, Rodney, Maeno, Tadashi
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2696555
_version_ 1780964180782743552
author Zhao, Xin
Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Walker, Rodney
Maeno, Tadashi
author_facet Zhao, Xin
Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Walker, Rodney
Maeno, Tadashi
author_sort Zhao, Xin
collection CERN
description The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimated requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" project to increase the usage of less expensive storage , i.e., tape or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be presented at this conference and used before LHC Run 3.
id cern-2696555
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26965552019-10-28T20:29:37Zhttp://cds.cern.ch/record/2696555engZhao, XinKlimentov, AlexeiBarisits, Martin-StefanBorodin, MikhailDi Girolamo, AlessandroElmsheuser, JohannesGolubkov, DmitryLassnig, MarioWalker, RodneyMaeno, TadashiATLAS Data CarouselParticle Physics - ExperimentThe ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimated requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" project to increase the usage of less expensive storage , i.e., tape or even commercial storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage, such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be presented at this conference and used before LHC Run 3.ATL-SOFT-SLIDE-2019-813oai:cds.cern.ch:26965552019-10-28
spellingShingle Particle Physics - Experiment
Zhao, Xin
Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Walker, Rodney
Maeno, Tadashi
ATLAS Data Carousel
title ATLAS Data Carousel
title_full ATLAS Data Carousel
title_fullStr ATLAS Data Carousel
title_full_unstemmed ATLAS Data Carousel
title_short ATLAS Data Carousel
title_sort atlas data carousel
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2696555
work_keys_str_mv AT zhaoxin atlasdatacarousel
AT klimentovalexei atlasdatacarousel
AT barisitsmartinstefan atlasdatacarousel
AT borodinmikhail atlasdatacarousel
AT digirolamoalessandro atlasdatacarousel
AT elmsheuserjohannes atlasdatacarousel
AT golubkovdmitry atlasdatacarousel
AT lassnigmario atlasdatacarousel
AT walkerrodney atlasdatacarousel
AT maenotadashi atlasdatacarousel