Cargando…

Federated data storage evolution in HENP: data lakes and beyond

Storage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zarochentsev, Andrey, Espinal, Xavier, Kiryanov, Andrey, Schovancová, Jaroslava
Lenguaje:eng
Publicado: IOP 2020
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/1525/1/012071
http://cds.cern.ch/record/2725603
_version_ 1780966038685351936
author Zarochentsev, Andrey
Espinal, Xavier
Kiryanov, Andrey
Schovancová, Jaroslava
author_facet Zarochentsev, Andrey
Espinal, Xavier
Kiryanov, Andrey
Schovancová, Jaroslava
author_sort Zarochentsev, Andrey
collection CERN
description Storage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volumes to increase by at least an order of magnitude as compared with the present systems. Extrapolating from existing trends in disk and tape pricing, and assuming flat infrastructure budgets, the implications for data handling for end-user analysis are significant. HENP experiments need to manage data across a variety of mediums based on the types of data and its uses: from tapes (cold storage) to disks and solid state drives (hot storage) to caches (including world wide access data in clouds and “data lakes”). The DataLake R&D; project aims at exploring an evolution of distributed storage while bearing in mind very high demands of the HL-LHC era. Its primary objective is to optimize hardware usage and operational costs of a storage system deployed across distributed centers connected by fat networks and operated as a single service. Such storage would host a large fraction of the data and optimize the cost, eliminating inefficiencies due to fragmentation. In this talk we will highlight current status of the project, its achievements, interconnection with other research activities in this field like WLCG-DOMA and ATLAS-Google DataOcean, and future plans.
id oai-inspirehep.net-1806239
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
publisher IOP
record_format invenio
spelling oai-inspirehep.net-18062392022-08-17T12:59:36Zdoi:10.1088/1742-6596/1525/1/012071http://cds.cern.ch/record/2725603engZarochentsev, AndreyEspinal, XavierKiryanov, AndreySchovancová, JaroslavaFederated data storage evolution in HENP: data lakes and beyondComputing and ComputersStorage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volumes to increase by at least an order of magnitude as compared with the present systems. Extrapolating from existing trends in disk and tape pricing, and assuming flat infrastructure budgets, the implications for data handling for end-user analysis are significant. HENP experiments need to manage data across a variety of mediums based on the types of data and its uses: from tapes (cold storage) to disks and solid state drives (hot storage) to caches (including world wide access data in clouds and “data lakes”). The DataLake R&D; project aims at exploring an evolution of distributed storage while bearing in mind very high demands of the HL-LHC era. Its primary objective is to optimize hardware usage and operational costs of a storage system deployed across distributed centers connected by fat networks and operated as a single service. Such storage would host a large fraction of the data and optimize the cost, eliminating inefficiencies due to fragmentation. In this talk we will highlight current status of the project, its achievements, interconnection with other research activities in this field like WLCG-DOMA and ATLAS-Google DataOcean, and future plans.IOPoai:inspirehep.net:18062392020
spellingShingle Computing and Computers
Zarochentsev, Andrey
Espinal, Xavier
Kiryanov, Andrey
Schovancová, Jaroslava
Federated data storage evolution in HENP: data lakes and beyond
title Federated data storage evolution in HENP: data lakes and beyond
title_full Federated data storage evolution in HENP: data lakes and beyond
title_fullStr Federated data storage evolution in HENP: data lakes and beyond
title_full_unstemmed Federated data storage evolution in HENP: data lakes and beyond
title_short Federated data storage evolution in HENP: data lakes and beyond
title_sort federated data storage evolution in henp: data lakes and beyond
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/1525/1/012071
http://cds.cern.ch/record/2725603
work_keys_str_mv AT zarochentsevandrey federateddatastorageevolutioninhenpdatalakesandbeyond
AT espinalxavier federateddatastorageevolutioninhenpdatalakesandbeyond
AT kiryanovandrey federateddatastorageevolutioninhenpdatalakesandbeyond
AT schovancovajaroslava federateddatastorageevolutioninhenpdatalakesandbeyond