Cargando…

Network Resiliency Implementation in the ATLAS TDAQ System

The ATLAS TDAQ system performs the real time selection of events produced by the detector. For this purpose approximately 2000 computers are deployed and interconnected through various high speed networks, whose architecture has already been described. This article focuses on the implementation and...

Descripción completa

Detalles Bibliográficos
Autor principal: Stancu, S N
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:http://cds.cern.ch/record/1267385
Descripción
Sumario:The ATLAS TDAQ system performs the real time selection of events produced by the detector. For this purpose approximately 2000 computers are deployed and interconnected through various high speed networks, whose architecture has already been described. This article focuses on the implementation and validation of network connectivity resiliency (previously presented at a conceptual level). Redundancy and eventually load balancing are achieved through the synergy of various protocols: 802.3ad link aggregation, OSPF, VRRP, MSTP. An innovative method for cost efficient redundant connectivity of high-throughput high-availability servers is presented. Furthermore, real life examples showing how redundancy works, and more importantly how it might fail despite careful planning are presented.