Cargando…

Fault-adaptive Scheduling for Data Acquisition Networks

Supporting such an all-to-all traffic matrix is challenging as it can easily lead to congestion. Scheduling patterns are designed to avoid such congestion by spreading the communications over time. The time is divided in phases and communications are spread across the phases. However, current schedu...

Descripción completa

Detalles Bibliográficos
Autores principales: Stein, Eloise, Bramas, Quentin, Colombo, Tommaso, Pelsser, Cristel
Lenguaje:eng
Publicado: 2023
Acceso en línea:https://dx.doi.org/10.1109/LCN58197.2023.10223324
http://cds.cern.ch/record/2875184
_version_ 1780978884253057024
author Stein, Eloise
Bramas, Quentin
Colombo, Tommaso
Pelsser, Cristel
author_facet Stein, Eloise
Bramas, Quentin
Colombo, Tommaso
Pelsser, Cristel
author_sort Stein, Eloise
collection CERN
description Supporting such an all-to-all traffic matrix is challenging as it can easily lead to congestion. Scheduling patterns are designed to avoid such congestion by spreading the communications over time. The time is divided in phases and communications are spread across the phases. However, current scheduling algorithms are not fault-tolerant. In this paper we propose a fault-adaptive congestion-free scheduling to support an all-to-all exchange in fat tree topology. Our approach consist in the computation of the minimum number of communication phases required to support the all-to-all exchange with the available links, and of the scheduling of the communications on these phases. It enables to recover from failures and makes optimal use of the remaining bandwidth. We show that our scheduling approach provides better performance than the most common approach which is the Linear-shift scheduling. The throughput is improved by roughly 80% with our approach, for as little as one link failure.
id cern-2875184
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28751842023-10-25T14:21:43Zdoi:10.1109/LCN58197.2023.10223324http://cds.cern.ch/record/2875184engStein, EloiseBramas, QuentinColombo, TommasoPelsser, CristelFault-adaptive Scheduling for Data Acquisition NetworksSupporting such an all-to-all traffic matrix is challenging as it can easily lead to congestion. Scheduling patterns are designed to avoid such congestion by spreading the communications over time. The time is divided in phases and communications are spread across the phases. However, current scheduling algorithms are not fault-tolerant. In this paper we propose a fault-adaptive congestion-free scheduling to support an all-to-all exchange in fat tree topology. Our approach consist in the computation of the minimum number of communication phases required to support the all-to-all exchange with the available links, and of the scheduling of the communications on these phases. It enables to recover from failures and makes optimal use of the remaining bandwidth. We show that our scheduling approach provides better performance than the most common approach which is the Linear-shift scheduling. The throughput is improved by roughly 80% with our approach, for as little as one link failure.oai:cds.cern.ch:28751842023
spellingShingle Stein, Eloise
Bramas, Quentin
Colombo, Tommaso
Pelsser, Cristel
Fault-adaptive Scheduling for Data Acquisition Networks
title Fault-adaptive Scheduling for Data Acquisition Networks
title_full Fault-adaptive Scheduling for Data Acquisition Networks
title_fullStr Fault-adaptive Scheduling for Data Acquisition Networks
title_full_unstemmed Fault-adaptive Scheduling for Data Acquisition Networks
title_short Fault-adaptive Scheduling for Data Acquisition Networks
title_sort fault-adaptive scheduling for data acquisition networks
url https://dx.doi.org/10.1109/LCN58197.2023.10223324
http://cds.cern.ch/record/2875184
work_keys_str_mv AT steineloise faultadaptiveschedulingfordataacquisitionnetworks
AT bramasquentin faultadaptiveschedulingfordataacquisitionnetworks
AT colombotommaso faultadaptiveschedulingfordataacquisitionnetworks
AT pelssercristel faultadaptiveschedulingfordataacquisitionnetworks