Cargando…

Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs

CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is div...

Descripción completa

Detalles Bibliográficos
Autores principales: Wolf, M, Mascheroni, M, Woodard, A, Belforte, S, Bockelman, B, Hernandez, J M, Vaandering, E
Lenguaje:eng
Publicado: 2017
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/898/5/052035
http://cds.cern.ch/record/2297172
_version_ 1780956901711806464
author Wolf, M
Mascheroni, M
Woodard, A
Belforte, S
Bockelman, B
Hernandez, J M
Vaandering, E
author_facet Wolf, M
Mascheroni, M
Woodard, A
Belforte, S
Bockelman, B
Hernandez, J M
Vaandering, E
author_sort Wolf, M
collection CERN
description CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distribute the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.
id oai-inspirehep.net-1638491
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2017
record_format invenio
spelling oai-inspirehep.net-16384912021-02-09T10:06:55Zdoi:10.1088/1742-6596/898/5/052035http://cds.cern.ch/record/2297172engWolf, MMascheroni, MWoodard, ABelforte, SBockelman, BHernandez, J MVaandering, EUse of DAGMan in CRAB3 to improve the splitting of CMS user jobsComputing and ComputersCRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distribute the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.oai:inspirehep.net:16384912017
spellingShingle Computing and Computers
Wolf, M
Mascheroni, M
Woodard, A
Belforte, S
Bockelman, B
Hernandez, J M
Vaandering, E
Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title_full Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title_fullStr Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title_full_unstemmed Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title_short Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs
title_sort use of dagman in crab3 to improve the splitting of cms user jobs
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/898/5/052035
http://cds.cern.ch/record/2297172
work_keys_str_mv AT wolfm useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT mascheronim useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT woodarda useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT belfortes useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT bockelmanb useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT hernandezjm useofdagmanincrab3toimprovethesplittingofcmsuserjobs
AT vaanderinge useofdagmanincrab3toimprovethesplittingofcmsuserjobs