Cargando…
AutoPyFactory and the Cloud
AutoPyFactory (APF) is a next-generation pilot submission framework that has been used as part of the ATLAS workload management system (PANDA) for two years. APF is reliable, scalable, and offers easy and flexible configuration. Using a plugin-based architecture, APF polls for information from confi...
Autores principales: | , , |
---|---|
Lenguaje: | eng |
Publicado: |
2013
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1607122 |
_version_ | 1780931712603127808 |
---|---|
author | Caballero, J Hover, J Love, P |
author_facet | Caballero, J Hover, J Love, P |
author_sort | Caballero, J |
collection | CERN |
description | AutoPyFactory (APF) is a next-generation pilot submission framework that has been used as part of the ATLAS workload management system (PANDA) for two years. APF is reliable, scalable, and offers easy and flexible configuration. Using a plugin-based architecture, APF polls for information from configured information and batch systems (including grid sites), decides how many additional pilot jobs are needed, and submits them. With the advent of cloud computing, providing resources goes beyond submitting pilots to grid sites. Now, the resources on which the pilot will run also need to be managed. Handling both pilot submission and controlling the virtual machine life cycle (creation, retirement, and termination) from the same framework allows robust and efficient management of the process. In this paper we describe the design and implementation of these virtual machine management capabilities of APF. Expanding on our plugin-based approach, we allow cascades of virtual resources associated with a job queue. A single workflow can be directed first to a private, facility-based cloud, then a free academic cloud, then spot-priced EC2 resources, and finally on-demand commercial clouds. Limits, weighting, and priorities are supported, allowing free or less expensive resources to be used first, with costly resources only used when necessary. As demand drops, resources are drained and terminated in reverse order. Performance plots and time series will be included, showing how the implementation handles ramp-ups, ramp-downs, and spot terminations. |
id | cern-1607122 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2013 |
record_format | invenio |
spelling | cern-16071222019-09-30T06:29:59Zhttp://cds.cern.ch/record/1607122engCaballero, JHover, JLove, PAutoPyFactory and the CloudDetectors and Experimental TechniquesAutoPyFactory (APF) is a next-generation pilot submission framework that has been used as part of the ATLAS workload management system (PANDA) for two years. APF is reliable, scalable, and offers easy and flexible configuration. Using a plugin-based architecture, APF polls for information from configured information and batch systems (including grid sites), decides how many additional pilot jobs are needed, and submits them. With the advent of cloud computing, providing resources goes beyond submitting pilots to grid sites. Now, the resources on which the pilot will run also need to be managed. Handling both pilot submission and controlling the virtual machine life cycle (creation, retirement, and termination) from the same framework allows robust and efficient management of the process. In this paper we describe the design and implementation of these virtual machine management capabilities of APF. Expanding on our plugin-based approach, we allow cascades of virtual resources associated with a job queue. A single workflow can be directed first to a private, facility-based cloud, then a free academic cloud, then spot-priced EC2 resources, and finally on-demand commercial clouds. Limits, weighting, and priorities are supported, allowing free or less expensive resources to be used first, with costly resources only used when necessary. As demand drops, resources are drained and terminated in reverse order. Performance plots and time series will be included, showing how the implementation handles ramp-ups, ramp-downs, and spot terminations.ATL-SOFT-SLIDE-2013-816oai:cds.cern.ch:16071222013-10-09 |
spellingShingle | Detectors and Experimental Techniques Caballero, J Hover, J Love, P AutoPyFactory and the Cloud |
title | AutoPyFactory and the Cloud |
title_full | AutoPyFactory and the Cloud |
title_fullStr | AutoPyFactory and the Cloud |
title_full_unstemmed | AutoPyFactory and the Cloud |
title_short | AutoPyFactory and the Cloud |
title_sort | autopyfactory and the cloud |
topic | Detectors and Experimental Techniques |
url | http://cds.cern.ch/record/1607122 |
work_keys_str_mv | AT caballeroj autopyfactoryandthecloud AT hoverj autopyfactoryandthecloud AT lovep autopyfactoryandthecloud |