Cargando…
Training and Serving ML workloads with Kubeflow at CERN
Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive ana...
Autores principales: | , |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1051/epjconf/202125102067 http://cds.cern.ch/record/2780362 |
_version_ | 1780971865141936128 |
---|---|
author | Golubovic, Dejan Rocha, Ricardo |
author_facet | Golubovic, Dejan Rocha, Ricardo |
author_sort | Golubovic, Dejan |
collection | CERN |
description | Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs. |
id | cern-2780362 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27803622021-09-07T19:17:04Zdoi:10.1051/epjconf/202125102067http://cds.cern.ch/record/2780362engGolubovic, DejanRocha, RicardoTraining and Serving ML workloads with Kubeflow at CERNComputing and ComputersMachine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.oai:cds.cern.ch:27803622021 |
spellingShingle | Computing and Computers Golubovic, Dejan Rocha, Ricardo Training and Serving ML workloads with Kubeflow at CERN |
title | Training and Serving ML workloads with Kubeflow at CERN |
title_full | Training and Serving ML workloads with Kubeflow at CERN |
title_fullStr | Training and Serving ML workloads with Kubeflow at CERN |
title_full_unstemmed | Training and Serving ML workloads with Kubeflow at CERN |
title_short | Training and Serving ML workloads with Kubeflow at CERN |
title_sort | training and serving ml workloads with kubeflow at cern |
topic | Computing and Computers |
url | https://dx.doi.org/10.1051/epjconf/202125102067 http://cds.cern.ch/record/2780362 |
work_keys_str_mv | AT golubovicdejan trainingandservingmlworkloadswithkubeflowatcern AT rocharicardo trainingandservingmlworkloadswithkubeflowatcern |