Cargando…

ATLAS TDAQ System Administration: evolution and re-design

The ATLAS Trigger and Data Acquisition (TDAQ) system is responsible for the online processing of live data, streaming from the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. The online farm is composed of $\sim 3000$ servers, processing the data readout from $\sim 100$ million detector...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ballestrero, Sergio, Bogdanchikov, Alexander, Brasolin, Franco, Contescu, Alexandru Cristian, Dubrov, Sergei, Fazio, Daniel, Korol, Aleksandr, Lee, Christopher Jon, Scannicchio, Diana, Twomey, Matthew Shaun
Lenguaje:	eng
Publicado:	2015
Materias:	Particle Physics - Experiment
Acceso en línea:	https://dx.doi.org/10.1088/1742-6596/664/8/082024 http://cds.cern.ch/record/2016420

_version_	1780946698933108736
author	Ballestrero, Sergio Bogdanchikov, Alexander Brasolin, Franco Contescu, Alexandru Cristian Dubrov, Sergei Fazio, Daniel Korol, Aleksandr Lee, Christopher Jon Scannicchio, Diana Twomey, Matthew Shaun
author_facet	Ballestrero, Sergio Bogdanchikov, Alexander Brasolin, Franco Contescu, Alexandru Cristian Dubrov, Sergei Fazio, Daniel Korol, Aleksandr Lee, Christopher Jon Scannicchio, Diana Twomey, Matthew Shaun
author_sort	Ballestrero, Sergio
collection	CERN
description	The ATLAS Trigger and Data Acquisition (TDAQ) system is responsible for the online processing of live data, streaming from the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. The online farm is composed of $\sim 3000$ servers, processing the data readout from $\sim 100$ million detector channels through multiple trigger levels. During the two years of the first Long Shutdown (LS1) there has been a tremendous amount of work done by the ATLAS TDAQ System Administrators, implementing numerous new software applications, upgrading the OS and the hardware, changing some design philosophies and exploiting the High Level Trigger farm with different purposes. The OS version has been upgraded to SLC6; for the largest part of the farm, which is composed by net booted nodes, this required a completely new design of the net booting system. In parallel, the migration to Puppet of the Configuration Management systems has been completed for both net booted and local booted hosts; the Post-Boot Scripts system and Quattor have been consequently dismissed. Virtual Machine~(VM) usage has been investigated and tested and many of our core servers are now running on VMs. Virtualisation has also been used to adapt the High Level Trigger farm as a batch system, which has been used for running Monte Carlo production jobs that are mostly CPU and not I/O bound. Finally, monitoring the health and the status of $\sim 3000$ machines in the experimental area is obviously of the utmost importance, so the obsolete Nagios v2 has been replaced with Icinga, complemented by Ganglia as a performance data provider. This paper serves for reporting "What", "Why" and "How" we did in order to improve and produce a system capable of performing for the next 3 years of ATLAS data taking.
id	cern-2016420
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2015
record_format	invenio
spelling	cern-20164202022-08-10T12:54:56Zdoi:10.1088/1742-6596/664/8/082024http://cds.cern.ch/record/2016420engBallestrero, SergioBogdanchikov, AlexanderBrasolin, FrancoContescu, Alexandru CristianDubrov, SergeiFazio, DanielKorol, AleksandrLee, Christopher JonScannicchio, DianaTwomey, Matthew ShaunATLAS TDAQ System Administration: evolution and re-designParticle Physics - ExperimentThe ATLAS Trigger and Data Acquisition (TDAQ) system is responsible for the online processing of live data, streaming from the ATLAS experiment at the Large Hadron Collider (LHC) at CERN. The online farm is composed of $\sim 3000$ servers, processing the data readout from $\sim 100$ million detector channels through multiple trigger levels. During the two years of the first Long Shutdown (LS1) there has been a tremendous amount of work done by the ATLAS TDAQ System Administrators, implementing numerous new software applications, upgrading the OS and the hardware, changing some design philosophies and exploiting the High Level Trigger farm with different purposes. The OS version has been upgraded to SLC6; for the largest part of the farm, which is composed by net booted nodes, this required a completely new design of the net booting system. In parallel, the migration to Puppet of the Configuration Management systems has been completed for both net booted and local booted hosts; the Post-Boot Scripts system and Quattor have been consequently dismissed. Virtual Machine~(VM) usage has been investigated and tested and many of our core servers are now running on VMs. Virtualisation has also been used to adapt the High Level Trigger farm as a batch system, which has been used for running Monte Carlo production jobs that are mostly CPU and not I/O bound. Finally, monitoring the health and the status of $\sim 3000$ machines in the experimental area is obviously of the utmost importance, so the obsolete Nagios v2 has been replaced with Icinga, complemented by Ganglia as a performance data provider. This paper serves for reporting "What", "Why" and "How" we did in order to improve and produce a system capable of performing for the next 3 years of ATLAS data taking.ATL-DAQ-PROC-2015-010oai:cds.cern.ch:20164202015-05-15
spellingShingle	Particle Physics - Experiment Ballestrero, Sergio Bogdanchikov, Alexander Brasolin, Franco Contescu, Alexandru Cristian Dubrov, Sergei Fazio, Daniel Korol, Aleksandr Lee, Christopher Jon Scannicchio, Diana Twomey, Matthew Shaun ATLAS TDAQ System Administration: evolution and re-design
title	ATLAS TDAQ System Administration: evolution and re-design
title_full	ATLAS TDAQ System Administration: evolution and re-design
title_fullStr	ATLAS TDAQ System Administration: evolution and re-design
title_full_unstemmed	ATLAS TDAQ System Administration: evolution and re-design
title_short	ATLAS TDAQ System Administration: evolution and re-design
title_sort	atlas tdaq system administration: evolution and re-design
topic	Particle Physics - Experiment
url	https://dx.doi.org/10.1088/1742-6596/664/8/082024 http://cds.cern.ch/record/2016420
work_keys_str_mv	AT ballestrerosergio atlastdaqsystemadministrationevolutionandredesign AT bogdanchikovalexander atlastdaqsystemadministrationevolutionandredesign AT brasolinfranco atlastdaqsystemadministrationevolutionandredesign AT contescualexandrucristian atlastdaqsystemadministrationevolutionandredesign AT dubrovsergei atlastdaqsystemadministrationevolutionandredesign AT faziodaniel atlastdaqsystemadministrationevolutionandredesign AT korolaleksandr atlastdaqsystemadministrationevolutionandredesign AT leechristopherjon atlastdaqsystemadministrationevolutionandredesign AT scannicchiodiana atlastdaqsystemadministrationevolutionandredesign AT twomeymatthewshaun atlastdaqsystemadministrationevolutionandredesign

ATLAS TDAQ System Administration: evolution and re-design

Ejemplares similares