Cargando…

Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2

The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources b...

Descripción completa

Detalles Bibliográficos
Autores principales: Balcas, J, Belforte, S, Bockelman, B, Gutsche, O, Khan, F, Larson, K, Letts, J, Mascheroni, M, Mason, D, McCrea, A, Saiz-Santos, M, Sfiligoi, I
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/664/6/062030
http://cds.cern.ch/record/2134603
_version_ 1780949913977225216
author Balcas, J
Belforte, S
Bockelman, B
Gutsche, O
Khan, F
Larson, K
Letts, J
Mascheroni, M
Mason, D
McCrea, A
Saiz-Santos, M
Sfiligoi, I
author_facet Balcas, J
Belforte, S
Bockelman, B
Gutsche, O
Khan, F
Larson, K
Letts, J
Mascheroni, M
Mason, D
McCrea, A
Saiz-Santos, M
Sfiligoi, I
author_sort Balcas, J
collection CERN
description The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. The major challenge of this unification activity is scale. The combined pool size is expected to reach 200K job slots, which is significantly bigger than any other multi-user HTCondor based system currently in production. To get there we have studied scaling limitations in our existing pools, the biggest of which tops out at about 70K slots, providing valuable feedback to the development communities, who have responded by delivering improvements which have helped us reach higher and higher scales with more stability. We have also worked on improving the organization and support model for this critical service during Run 2 of the LHC. This contribution will present the results of the scale testing and experiences from the first months of running the Global Pool.
id oai-inspirehep.net-1413951
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling oai-inspirehep.net-14139512022-08-10T13:00:59Zdoi:10.1088/1742-6596/664/6/062030http://cds.cern.ch/record/2134603engBalcas, JBelforte, SBockelman, BGutsche, OKhan, FLarson, KLetts, JMascheroni, MMason, DMcCrea, ASaiz-Santos, MSfiligoi, IPushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2Computing and ComputersThe CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. The major challenge of this unification activity is scale. The combined pool size is expected to reach 200K job slots, which is significantly bigger than any other multi-user HTCondor based system currently in production. To get there we have studied scaling limitations in our existing pools, the biggest of which tops out at about 70K slots, providing valuable feedback to the development communities, who have responded by delivering improvements which have helped us reach higher and higher scales with more stability. We have also worked on improving the organization and support model for this critical service during Run 2 of the LHC. This contribution will present the results of the scale testing and experiences from the first months of running the Global Pool.FERMILAB-CONF-15-604-CDoai:inspirehep.net:14139512015
spellingShingle Computing and Computers
Balcas, J
Belforte, S
Bockelman, B
Gutsche, O
Khan, F
Larson, K
Letts, J
Mascheroni, M
Mason, D
McCrea, A
Saiz-Santos, M
Sfiligoi, I
Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title_full Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title_fullStr Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title_full_unstemmed Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title_short Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
title_sort pushing htcondor and glideinwms to 200k+ jobs in a global pool for cms before run 2
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/664/6/062030
http://cds.cern.ch/record/2134603
work_keys_str_mv AT balcasj pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT belfortes pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT bockelmanb pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT gutscheo pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT khanf pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT larsonk pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT lettsj pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT mascheronim pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT masond pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT mccreaa pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT saizsantosm pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2
AT sfiligoii pushinghtcondorandglideinwmsto200kjobsinaglobalpoolforcmsbeforerun2