Cargando…

The ATLAS Data Carousel Project

The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workfl...

Descripción completa

Detalles Bibliográficos
Autores principales: Klimentov, Alexei, Barisits, Martin-Stefan, Borodin, Mikhail, Di Girolamo, Alessandro, Elmsheuser, Johannes, Golubkov, Dmitry, Lassnig, Mario, Maeno, Tadashi, Walker, Rodney, Zhao, Xin
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2693664
_version_ 1780964044430114816
author Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Maeno, Tadashi
Walker, Rodney
Zhao, Xin
author_facet Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Maeno, Tadashi
Walker, Rodney
Zhao, Xin
author_sort Klimentov, Alexei
collection CERN
description The ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimate requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" and "Hot/Cold Storage" projects to increase the usage of less expensive storage , i.e., tape or even commercial cloud storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage , such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems and more intensive data migration between hot (disk) and cold (tape) storage systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be used before LHC Run 3. In addition we will present the first results related to our R&D project with Google Cloud Platform for a similar studies.
id cern-2693664
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling cern-26936642019-10-16T18:40:34Zhttp://cds.cern.ch/record/2693664engKlimentov, AlexeiBarisits, Martin-StefanBorodin, MikhailDi Girolamo, AlessandroElmsheuser, JohannesGolubkov, DmitryLassnig, MarioMaeno, TadashiWalker, RodneyZhao, XinThe ATLAS Data Carousel ProjectParticle Physics - ExperimentThe ATLAS Experiment is storing detector and simulation data in raw and derived data formats across more than 150 Grid sites world-wide: currently, in total about 200 PB of disk storage and 250 PB of tape storage is used. Data have different access characteristics due to various computational workflows. Raw data is only processed about once per year, whereas derived data are accessed continuously by physics researchers. Data can be accessed from a variety of mediums, such as data streamed from remote locations, data cached on local storage using hard disk drives or SSDs, while larger data centers provide the majority of offline storage capability via tape systems. Disk is comparatively more expensive than tape, and even for disks there are different types of drive technologies that vary considerably in price and performance. Slow data access can dramatically increase costs for computation. The HL-LHC era data storage estimate requirements are several factors bigger than the present forecast of available resources, based on a flat budget assumption. On the computing side, ATLAS Distributed Computing (ADC) was very successful in the last years with HPC and HTC integration and using opportunistic computing resources for the Monte-Carlo production. On the other hand, equivalent opportunistic storage does not exist for HEP experiments. ADC started the "Data Carousel" and "Hot/Cold Storage" projects to increase the usage of less expensive storage , i.e., tape or even commercial cloud storage, so it is not limited to tape technologies exclusively. Data Carousel orchestrates data processing between workload management, data management, and storage services with the bulk data resident on offline storage. The processing is executed by staging and promptly processing a sliding window of inputs onto faster buffer storage , such that only a small percentage of input date are available at any one time. With this project we aim to demonstrate that this is the natural way to dramatically reduce our storage costs. The first phase of the project was started in the fall of 2018 and was related to I/O tests of the sites archiving systems. Now we are at Phase II, which requires a tight integration of the workload and data management systems and more intensive data migration between hot (disk) and cold (tape) storage systems. Additionally, the Data Carousel will study the feasibility to run multiple competing workflows from tape. The project is progressing very well and the results will be used before LHC Run 3. In addition we will present the first results related to our R&D project with Google Cloud Platform for a similar studies.ATL-SOFT-SLIDE-2019-771oai:cds.cern.ch:26936642019-10-15
spellingShingle Particle Physics - Experiment
Klimentov, Alexei
Barisits, Martin-Stefan
Borodin, Mikhail
Di Girolamo, Alessandro
Elmsheuser, Johannes
Golubkov, Dmitry
Lassnig, Mario
Maeno, Tadashi
Walker, Rodney
Zhao, Xin
The ATLAS Data Carousel Project
title The ATLAS Data Carousel Project
title_full The ATLAS Data Carousel Project
title_fullStr The ATLAS Data Carousel Project
title_full_unstemmed The ATLAS Data Carousel Project
title_short The ATLAS Data Carousel Project
title_sort atlas data carousel project
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2693664
work_keys_str_mv AT klimentovalexei theatlasdatacarouselproject
AT barisitsmartinstefan theatlasdatacarouselproject
AT borodinmikhail theatlasdatacarouselproject
AT digirolamoalessandro theatlasdatacarouselproject
AT elmsheuserjohannes theatlasdatacarouselproject
AT golubkovdmitry theatlasdatacarouselproject
AT lassnigmario theatlasdatacarouselproject
AT maenotadashi theatlasdatacarouselproject
AT walkerrodney theatlasdatacarouselproject
AT zhaoxin theatlasdatacarouselproject
AT klimentovalexei atlasdatacarouselproject
AT barisitsmartinstefan atlasdatacarouselproject
AT borodinmikhail atlasdatacarouselproject
AT digirolamoalessandro atlasdatacarouselproject
AT elmsheuserjohannes atlasdatacarouselproject
AT golubkovdmitry atlasdatacarouselproject
AT lassnigmario atlasdatacarouselproject
AT maenotadashi atlasdatacarouselproject
AT walkerrodney atlasdatacarouselproject
AT zhaoxin atlasdatacarouselproject