Cargando…

Distributed Data Collection For Next Generation ATLAS EventIndex Project

The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernandez Casani, Alvaro, Barberis, Dario, Sánchez, Javier, Garcia Montoro, Carlos, Gonzalez de la Hoz, Santiago, Salt, Jose
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2626914
_version_ 1780958933352972288
author Fernandez Casani, Alvaro
Barberis, Dario
Sánchez, Javier
Garcia Montoro, Carlos
Gonzalez de la Hoz, Santiago
Salt, Jose
author_facet Fernandez Casani, Alvaro
Barberis, Dario
Sánchez, Javier
Garcia Montoro, Carlos
Gonzalez de la Hoz, Santiago
Salt, Jose
author_sort Fernandez Casani, Alvaro
collection CERN
description The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporary maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata. This motivation arises in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex will evolve towards a generalized event WhiteBoard, with the ability to build collections and virtual datasets for end users. This paper describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4.
id cern-2626914
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling cern-26269142019-09-30T06:29:59Zhttp://cds.cern.ch/record/2626914engFernandez Casani, AlvaroBarberis, DarioSánchez, JavierGarcia Montoro, CarlosGonzalez de la Hoz, SantiagoSalt, JoseDistributed Data Collection For Next Generation ATLAS EventIndex ProjectParticle Physics - ExperimentThe ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporary maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata. This motivation arises in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex will evolve towards a generalized event WhiteBoard, with the ability to build collections and virtual datasets for end users. This paper describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4.ATL-SOFT-SLIDE-2018-414oai:cds.cern.ch:26269142018-06-27
spellingShingle Particle Physics - Experiment
Fernandez Casani, Alvaro
Barberis, Dario
Sánchez, Javier
Garcia Montoro, Carlos
Gonzalez de la Hoz, Santiago
Salt, Jose
Distributed Data Collection For Next Generation ATLAS EventIndex Project
title Distributed Data Collection For Next Generation ATLAS EventIndex Project
title_full Distributed Data Collection For Next Generation ATLAS EventIndex Project
title_fullStr Distributed Data Collection For Next Generation ATLAS EventIndex Project
title_full_unstemmed Distributed Data Collection For Next Generation ATLAS EventIndex Project
title_short Distributed Data Collection For Next Generation ATLAS EventIndex Project
title_sort distributed data collection for next generation atlas eventindex project
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2626914
work_keys_str_mv AT fernandezcasanialvaro distributeddatacollectionfornextgenerationatlaseventindexproject
AT barberisdario distributeddatacollectionfornextgenerationatlaseventindexproject
AT sanchezjavier distributeddatacollectionfornextgenerationatlaseventindexproject
AT garciamontorocarlos distributeddatacollectionfornextgenerationatlaseventindexproject
AT gonzalezdelahozsantiago distributeddatacollectionfornextgenerationatlaseventindexproject
AT saltjose distributeddatacollectionfornextgenerationatlaseventindexproject