Cargando…
Experiences with the ALICE Mesos infrastructure
Apache Mesos is a resource management system for large data centres, initially developed by UC Berkeley, and now maintained under the Apache Foundation umbrella. It is widely used in the industry by companies like Apple, Twitter, and Airbnb and it is known to scale to 10 000s of nodes. Together with...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/898/8/082043 http://cds.cern.ch/record/2296659 |
Sumario: | Apache Mesos is a resource management system for large data centres, initially developed by UC Berkeley, and now maintained under the Apache Foundation umbrella. It is widely used in the industry by companies like Apple, Twitter, and Airbnb and it is known to scale to 10 000s of nodes. Together with other tools of its ecosystem, such as Mesosphere Marathon or Metronome, it provides an end-to-end solution for datacenter operations and a unified way to exploit large distributed systems. We present the experience of the ALICE Experiment Offline & Computing in deploying and using in production the Apache Mesos ecosystem for a variety of tasks on a small 500 cores cluster, using hybrid OpenStack and bare metal resources. We will initially introduce the architecture of our setup and its operation, we will then describe the tasks which are performed by it, including release building and QA, release validation, and simple Monte Carlo production. We will show how we developed Mesos enabled components (called “Mesos Frameworks”) to carry out ALICE specific needs. In particular, we will illustrate our effort to integrate Work Queue, a lightweight batch processing engine developed by University of Notre Dame, which ALICE uses to orchestrate release validation. Finally, we will give an outlook on how to use Mesos as resource manager for DDS, a software deployment system developed by GSI which will be the foundation of the system deployment for ALICE next generation Online-Offline (O2). |
---|