Cargando…
The ATLAS EventIndex and its evolution based on Apache Kudu storage
The ATLAS experiment produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the world. The EventIndex catalogues the basic elements of these data: real and simulated events. It provides...
Autores principales: | , , , , , , , , , , , , , , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2646132 |
_version_ | 1780960477715628032 |
---|---|
author | Barberis, Dario Prokoshin, Fedor Alexandrov, Evgeny Aleksandrov, Igor Baranowski, Zbigniew Canali, Luca Dimitrov, Gancho Fernandez Casani, Alvaro Gallas, Elizabeth Garcia Montoro, Carlos Gonzalez de la Hoz, Santiago Hrivnac, Julius Iakovlev, Alexander Kazymov, Andrei Mineev, Mikhail Rybkin, Grigori Sánchez, Javier Salt, José Vasileva, Petya Tsvetanova Villaplana Perez, Miguel |
author_facet | Barberis, Dario Prokoshin, Fedor Alexandrov, Evgeny Aleksandrov, Igor Baranowski, Zbigniew Canali, Luca Dimitrov, Gancho Fernandez Casani, Alvaro Gallas, Elizabeth Garcia Montoro, Carlos Gonzalez de la Hoz, Santiago Hrivnac, Julius Iakovlev, Alexander Kazymov, Andrei Mineev, Mikhail Rybkin, Grigori Sánchez, Javier Salt, José Vasileva, Petya Tsvetanova Villaplana Perez, Miguel |
author_sort | Barberis, Dario |
collection | CERN |
description | The ATLAS experiment produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the world. The EventIndex catalogues the basic elements of these data: real and simulated events. It provides the means to select and access event data in the ATLAS distributed storage system, and provides support for completeness and consistency checks and data overlap studies. The EventIndex employs various data handling technologies like Hadoop and Oracle databases, and is integrated with other elements of the ATLAS distributed computing infrastructure, including systems for data, metadata, and production management (AMI, Rucio and PANDA). The project is in operation since the start of LHC Run 2 in 2015, and is in permanent development in order to fit the analysis and production demands and follow technology evolutions. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which looked promising at the design level and is now used to build a prototype to measure the scaling capabilities as a function of data input rates, total data volumes and data query and retrieval rates. An extension of the EventIndex functionalities to support the concept of Virtual Datasets produced additional requirements that are tested on the same Kudu prototype, in order to estimate the system performance and response times for different internal data organisations. This paper reports on the current system performance and on the first measurements of the new prototype based on Kudu. |
id | cern-2646132 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2018 |
record_format | invenio |
spelling | cern-26461322019-09-30T06:29:59Zhttp://cds.cern.ch/record/2646132engBarberis, DarioProkoshin, FedorAlexandrov, EvgenyAleksandrov, IgorBaranowski, ZbigniewCanali, LucaDimitrov, GanchoFernandez Casani, AlvaroGallas, ElizabethGarcia Montoro, CarlosGonzalez de la Hoz, SantiagoHrivnac, JuliusIakovlev, AlexanderKazymov, AndreiMineev, MikhailRybkin, GrigoriSánchez, JavierSalt, JoséVasileva, Petya TsvetanovaVillaplana Perez, MiguelThe ATLAS EventIndex and its evolution based on Apache Kudu storageParticle Physics - ExperimentThe ATLAS experiment produced hundreds of petabytes of data and expects to have one order of magnitude more in the future. This data are spread among hundreds of computing Grid sites around the world. The EventIndex catalogues the basic elements of these data: real and simulated events. It provides the means to select and access event data in the ATLAS distributed storage system, and provides support for completeness and consistency checks and data overlap studies. The EventIndex employs various data handling technologies like Hadoop and Oracle databases, and is integrated with other elements of the ATLAS distributed computing infrastructure, including systems for data, metadata, and production management (AMI, Rucio and PANDA). The project is in operation since the start of LHC Run 2 in 2015, and is in permanent development in order to fit the analysis and production demands and follow technology evolutions. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which looked promising at the design level and is now used to build a prototype to measure the scaling capabilities as a function of data input rates, total data volumes and data query and retrieval rates. An extension of the EventIndex functionalities to support the concept of Virtual Datasets produced additional requirements that are tested on the same Kudu prototype, in order to estimate the system performance and response times for different internal data organisations. This paper reports on the current system performance and on the first measurements of the new prototype based on Kudu.ATL-SOFT-PROC-2018-017oai:cds.cern.ch:26461322018-11-06 |
spellingShingle | Particle Physics - Experiment Barberis, Dario Prokoshin, Fedor Alexandrov, Evgeny Aleksandrov, Igor Baranowski, Zbigniew Canali, Luca Dimitrov, Gancho Fernandez Casani, Alvaro Gallas, Elizabeth Garcia Montoro, Carlos Gonzalez de la Hoz, Santiago Hrivnac, Julius Iakovlev, Alexander Kazymov, Andrei Mineev, Mikhail Rybkin, Grigori Sánchez, Javier Salt, José Vasileva, Petya Tsvetanova Villaplana Perez, Miguel The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title | The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title_full | The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title_fullStr | The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title_full_unstemmed | The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title_short | The ATLAS EventIndex and its evolution based on Apache Kudu storage |
title_sort | atlas eventindex and its evolution based on apache kudu storage |
topic | Particle Physics - Experiment |
url | http://cds.cern.ch/record/2646132 |
work_keys_str_mv | AT barberisdario theatlaseventindexanditsevolutionbasedonapachekudustorage AT prokoshinfedor theatlaseventindexanditsevolutionbasedonapachekudustorage AT alexandrovevgeny theatlaseventindexanditsevolutionbasedonapachekudustorage AT aleksandrovigor theatlaseventindexanditsevolutionbasedonapachekudustorage AT baranowskizbigniew theatlaseventindexanditsevolutionbasedonapachekudustorage AT canaliluca theatlaseventindexanditsevolutionbasedonapachekudustorage AT dimitrovgancho theatlaseventindexanditsevolutionbasedonapachekudustorage AT fernandezcasanialvaro theatlaseventindexanditsevolutionbasedonapachekudustorage AT gallaselizabeth theatlaseventindexanditsevolutionbasedonapachekudustorage AT garciamontorocarlos theatlaseventindexanditsevolutionbasedonapachekudustorage AT gonzalezdelahozsantiago theatlaseventindexanditsevolutionbasedonapachekudustorage AT hrivnacjulius theatlaseventindexanditsevolutionbasedonapachekudustorage AT iakovlevalexander theatlaseventindexanditsevolutionbasedonapachekudustorage AT kazymovandrei theatlaseventindexanditsevolutionbasedonapachekudustorage AT mineevmikhail theatlaseventindexanditsevolutionbasedonapachekudustorage AT rybkingrigori theatlaseventindexanditsevolutionbasedonapachekudustorage AT sanchezjavier theatlaseventindexanditsevolutionbasedonapachekudustorage AT saltjose theatlaseventindexanditsevolutionbasedonapachekudustorage AT vasilevapetyatsvetanova theatlaseventindexanditsevolutionbasedonapachekudustorage AT villaplanaperezmiguel theatlaseventindexanditsevolutionbasedonapachekudustorage AT barberisdario atlaseventindexanditsevolutionbasedonapachekudustorage AT prokoshinfedor atlaseventindexanditsevolutionbasedonapachekudustorage AT alexandrovevgeny atlaseventindexanditsevolutionbasedonapachekudustorage AT aleksandrovigor atlaseventindexanditsevolutionbasedonapachekudustorage AT baranowskizbigniew atlaseventindexanditsevolutionbasedonapachekudustorage AT canaliluca atlaseventindexanditsevolutionbasedonapachekudustorage AT dimitrovgancho atlaseventindexanditsevolutionbasedonapachekudustorage AT fernandezcasanialvaro atlaseventindexanditsevolutionbasedonapachekudustorage AT gallaselizabeth atlaseventindexanditsevolutionbasedonapachekudustorage AT garciamontorocarlos atlaseventindexanditsevolutionbasedonapachekudustorage AT gonzalezdelahozsantiago atlaseventindexanditsevolutionbasedonapachekudustorage AT hrivnacjulius atlaseventindexanditsevolutionbasedonapachekudustorage AT iakovlevalexander atlaseventindexanditsevolutionbasedonapachekudustorage AT kazymovandrei atlaseventindexanditsevolutionbasedonapachekudustorage AT mineevmikhail atlaseventindexanditsevolutionbasedonapachekudustorage AT rybkingrigori atlaseventindexanditsevolutionbasedonapachekudustorage AT sanchezjavier atlaseventindexanditsevolutionbasedonapachekudustorage AT saltjose atlaseventindexanditsevolutionbasedonapachekudustorage AT vasilevapetyatsvetanova atlaseventindexanditsevolutionbasedonapachekudustorage AT villaplanaperezmiguel atlaseventindexanditsevolutionbasedonapachekudustorage |