Cargando…

Training and Serving ML workloads with Kubeflow at CERN

<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation an...

Descripción completa

Detalles Bibliográficos
Autor principal: Golubovic, Dejan
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2767307
_version_ 1780971301248172032
author Golubovic, Dejan
author_facet Golubovic, Dejan
author_sort Golubovic, Dejan
collection CERN
description <!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
id cern-2767307
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27673072022-11-02T22:25:35Zhttp://cds.cern.ch/record/2767307engGolubovic, DejanTraining and Serving ML workloads with Kubeflow at CERN25th International Conference on Computing in High Energy & Nuclear PhysicsConferences<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.oai:cds.cern.ch:27673072021
spellingShingle Conferences
Golubovic, Dejan
Training and Serving ML workloads with Kubeflow at CERN
title Training and Serving ML workloads with Kubeflow at CERN
title_full Training and Serving ML workloads with Kubeflow at CERN
title_fullStr Training and Serving ML workloads with Kubeflow at CERN
title_full_unstemmed Training and Serving ML workloads with Kubeflow at CERN
title_short Training and Serving ML workloads with Kubeflow at CERN
title_sort training and serving ml workloads with kubeflow at cern
topic Conferences
url http://cds.cern.ch/record/2767307
work_keys_str_mv AT golubovicdejan trainingandservingmlworkloadswithkubeflowatcern
AT golubovicdejan 25thinternationalconferenceoncomputinginhighenergynuclearphysics