Cargando…

Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue

The ATLAS experiment collects billions of events per year of data-taking, and processes them to make them available for physics analysis in several different formats. An even larger amount of events is in addition simulated according to physics and detector models and then reconstructed and analysed...

Descripción completa

Detalles Bibliográficos
Autores principales: Favareto, Andrea, Barberis, Dario, Cardenas Zarate, Simon Ernesto, Cranshaw, Jack, Fernandez Casani, Alvaro, Gallas, Elizabeth, Gonzalez de la Hoz, Santiago, Hrivnac, Julius, Malon, David, Prokoshin, Fedor, Salt, Jose, Sanchez, Javier, Toebbicke, Rainer, Yuan, Ruijun, Garcia Montoro, Carlos
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:http://cds.cern.ch/record/2055281
_version_ 1780948283314667520
author Favareto, Andrea
Barberis, Dario
Cardenas Zarate, Simon Ernesto
Cranshaw, Jack
Fernandez Casani, Alvaro
Gallas, Elizabeth
Gonzalez de la Hoz, Santiago
Hrivnac, Julius
Malon, David
Prokoshin, Fedor
Salt, Jose
Sanchez, Javier
Toebbicke, Rainer
Yuan, Ruijun
Garcia Montoro, Carlos
author_facet Favareto, Andrea
Barberis, Dario
Cardenas Zarate, Simon Ernesto
Cranshaw, Jack
Fernandez Casani, Alvaro
Gallas, Elizabeth
Gonzalez de la Hoz, Santiago
Hrivnac, Julius
Malon, David
Prokoshin, Fedor
Salt, Jose
Sanchez, Javier
Toebbicke, Rainer
Yuan, Ruijun
Garcia Montoro, Carlos
author_sort Favareto, Andrea
collection CERN
description The ATLAS experiment collects billions of events per year of data-taking, and processes them to make them available for physics analysis in several different formats. An even larger amount of events is in addition simulated according to physics and detector models and then reconstructed and analysed to be compared to real events. The EventIndex is a catalogue of all events in each production stage; it includes for each event a few identification parameters, some basic non-mutable information coming from the online system, and the references to the files that contain the event in each format (plus the internal pointers to the event within each file for quick retrieval). Each EventIndex record is logically simple but the system has to hold many tens of billions of records, all equally important. The Hadoop technology was selected at the start of the EventIndex project development in 2012 and proved to be robust and flexible to accommodate this kind of information; both the insertion times and query response times are acceptable for the continuous and automatic operation that started in spring 2015. This talk will describe the EventIndex data input and organisation in Hadoop and explain the operational challenges that were overcome in order to achieve the expected good performance.
id cern-2055281
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling cern-20552812019-09-30T06:29:59Zhttp://cds.cern.ch/record/2055281engFavareto, AndreaBarberis, DarioCardenas Zarate, Simon ErnestoCranshaw, JackFernandez Casani, AlvaroGallas, ElizabethGonzalez de la Hoz, SantiagoHrivnac, JuliusMalon, DavidProkoshin, FedorSalt, JoseSanchez, JavierToebbicke, RainerYuan, RuijunGarcia Montoro, CarlosUse of the Hadoop structured storage tools for the ATLAS EventIndex event catalogueParticle Physics - ExperimentThe ATLAS experiment collects billions of events per year of data-taking, and processes them to make them available for physics analysis in several different formats. An even larger amount of events is in addition simulated according to physics and detector models and then reconstructed and analysed to be compared to real events. The EventIndex is a catalogue of all events in each production stage; it includes for each event a few identification parameters, some basic non-mutable information coming from the online system, and the references to the files that contain the event in each format (plus the internal pointers to the event within each file for quick retrieval). Each EventIndex record is logically simple but the system has to hold many tens of billions of records, all equally important. The Hadoop technology was selected at the start of the EventIndex project development in 2012 and proved to be robust and flexible to accommodate this kind of information; both the insertion times and query response times are acceptable for the continuous and automatic operation that started in spring 2015. This talk will describe the EventIndex data input and organisation in Hadoop and explain the operational challenges that were overcome in order to achieve the expected good performance.ATL-SOFT-PROC-2015-059oai:cds.cern.ch:20552812015-09-27
spellingShingle Particle Physics - Experiment
Favareto, Andrea
Barberis, Dario
Cardenas Zarate, Simon Ernesto
Cranshaw, Jack
Fernandez Casani, Alvaro
Gallas, Elizabeth
Gonzalez de la Hoz, Santiago
Hrivnac, Julius
Malon, David
Prokoshin, Fedor
Salt, Jose
Sanchez, Javier
Toebbicke, Rainer
Yuan, Ruijun
Garcia Montoro, Carlos
Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title_full Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title_fullStr Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title_full_unstemmed Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title_short Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue
title_sort use of the hadoop structured storage tools for the atlas eventindex event catalogue
topic Particle Physics - Experiment
url http://cds.cern.ch/record/2055281
work_keys_str_mv AT favaretoandrea useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT barberisdario useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT cardenaszaratesimonernesto useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT cranshawjack useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT fernandezcasanialvaro useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT gallaselizabeth useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT gonzalezdelahozsantiago useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT hrivnacjulius useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT malondavid useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT prokoshinfedor useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT saltjose useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT sanchezjavier useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT toebbickerainer useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT yuanruijun useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue
AT garciamontorocarlos useofthehadoopstructuredstoragetoolsfortheatlaseventindexeventcatalogue