Cargando…

Training and Serving ML workloads with Kubeflow at CERN

Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation an...

Descripción completa

Detalles Bibliográficos
Autor principal:	Golubovic, Dejan
Lenguaje:	eng
Publicado:	2021
Materias:	Conferences
Acceso en línea:	http://cds.cern.ch/record/2767307

_version_	1780971301248172032
author	Golubovic, Dejan
author_facet	Golubovic, Dejan
author_sort	Golubovic, Dejan
collection	CERN
description	<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
id	cern-2767307
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2021
record_format	invenio
spelling	cern-27673072022-11-02T22:25:35Zhttp://cds.cern.ch/record/2767307engGolubovic, DejanTraining and Serving ML workloads with Kubeflow at CERN25th International Conference on Computing in High Energy & Nuclear PhysicsConferences<!--HTML-->Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.oai:cds.cern.ch:27673072021
spellingShingle	Conferences Golubovic, Dejan Training and Serving ML workloads with Kubeflow at CERN
title	Training and Serving ML workloads with Kubeflow at CERN
title_full	Training and Serving ML workloads with Kubeflow at CERN
title_fullStr	Training and Serving ML workloads with Kubeflow at CERN
title_full_unstemmed	Training and Serving ML workloads with Kubeflow at CERN
title_short	Training and Serving ML workloads with Kubeflow at CERN
title_sort	training and serving ml workloads with kubeflow at cern
topic	Conferences
url	http://cds.cern.ch/record/2767307
work_keys_str_mv	AT golubovicdejan trainingandservingmlworkloadswithkubeflowatcern AT golubovicdejan 25thinternationalconferenceoncomputinginhighenergynuclearphysics

Training and Serving ML workloads with Kubeflow at CERN

Ejemplares similares