Cargando…

CMSSW Scaling Limits on Many-Core Machines

Today the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processor...

Descripción completa

Detalles Bibliográficos
Autor principal: Jones, Christopher Duncan
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2872253
_version_ 1780978594087960576
author Jones, Christopher Duncan
author_facet Jones, Christopher Duncan
author_sort Jones, Christopher Duncan
collection CERN
description Today the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processors, and 128 cores on Ampere Altra Max ARM processors. Over the course of the past decade, the CMS data processing framework, CMSSW, has been transformed from a single-threaded framework into a highly concurrent one. The first multithreaded version was brought into production by the start of the LHC Run 2 in 2015. Since then, the framework's threading efficiency has gradually been improved by adding more levels of concurrency and reducing the amount of serial code paths. The latest addition was support for concurrent Runs. In this work we review the concurrency model of the CMSSW, and measure its scalability with real CMS applications, such as simulation and reconstruction, on modern many-core machines. We show metrics such as event processing throughput and application memory usage with and without the contribution of I/O, as I/O has been the major scaling limitation for the CMS applications.
id cern-2872253
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28722532023-09-25T18:53:32Zhttp://cds.cern.ch/record/2872253engJones, Christopher DuncanCMSSW Scaling Limits on Many-Core MachinesDetectors and Experimental TechniquesToday the LHC offline computing relies heavily on CPU resources, despite the interest in compute accelerators, such as GPUs, for the longer term future. The number of cores per CPU socket has continued to increase steadily, reaching the levels of 64 cores (128 threads) with recent AMD EPYC processors, and 128 cores on Ampere Altra Max ARM processors. Over the course of the past decade, the CMS data processing framework, CMSSW, has been transformed from a single-threaded framework into a highly concurrent one. The first multithreaded version was brought into production by the start of the LHC Run 2 in 2015. Since then, the framework's threading efficiency has gradually been improved by adding more levels of concurrency and reducing the amount of serial code paths. The latest addition was support for concurrent Runs. In this work we review the concurrency model of the CMSSW, and measure its scalability with real CMS applications, such as simulation and reconstruction, on modern many-core machines. We show metrics such as event processing throughput and application memory usage with and without the contribution of I/O, as I/O has been the major scaling limitation for the CMS applications.CMS-CR-2023-116oai:cds.cern.ch:28722532023-08-15
spellingShingle Detectors and Experimental Techniques
Jones, Christopher Duncan
CMSSW Scaling Limits on Many-Core Machines
title CMSSW Scaling Limits on Many-Core Machines
title_full CMSSW Scaling Limits on Many-Core Machines
title_fullStr CMSSW Scaling Limits on Many-Core Machines
title_full_unstemmed CMSSW Scaling Limits on Many-Core Machines
title_short CMSSW Scaling Limits on Many-Core Machines
title_sort cmssw scaling limits on many-core machines
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/2872253
work_keys_str_mv AT joneschristopherduncan cmsswscalinglimitsonmanycoremachines