Cargando…
Training and Serving ML workloads with Kubeflow at CERN
<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation an...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2767307 |
_version_ | 1780971301248172032 |
---|---|
author | Golubovic, Dejan |
author_facet | Golubovic, Dejan |
author_sort | Golubovic, Dejan |
collection | CERN |
description | <!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs. |
id | cern-2767307 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27673072022-11-02T22:25:35Zhttp://cds.cern.ch/record/2767307engGolubovic, DejanTraining and Serving ML workloads with Kubeflow at CERN25th International Conference on Computing in High Energy & Nuclear PhysicsConferences<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.oai:cds.cern.ch:27673072021 |
spellingShingle | Conferences Golubovic, Dejan Training and Serving ML workloads with Kubeflow at CERN |
title | Training and Serving ML workloads with Kubeflow at CERN |
title_full | Training and Serving ML workloads with Kubeflow at CERN |
title_fullStr | Training and Serving ML workloads with Kubeflow at CERN |
title_full_unstemmed | Training and Serving ML workloads with Kubeflow at CERN |
title_short | Training and Serving ML workloads with Kubeflow at CERN |
title_sort | training and serving ml workloads with kubeflow at cern |
topic | Conferences |
url | http://cds.cern.ch/record/2767307 |
work_keys_str_mv | AT golubovicdejan trainingandservingmlworkloadswithkubeflowatcern AT golubovicdejan 25thinternationalconferenceoncomputinginhighenergynuclearphysics |