Cargando…

Training and Serving ML workloads with Kubeflow at CERN

Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive ana...

Descripción completa

Detalles Bibliográficos
Autores principales:	Golubovic, Dejan, Rocha, Ricardo
Lenguaje:	eng
Publicado:	2021
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1051/epjconf/202125102067 http://cds.cern.ch/record/2780362

_version_	1780971865141936128
author	Golubovic, Dejan Rocha, Ricardo
author_facet	Golubovic, Dejan Rocha, Ricardo
author_sort	Golubovic, Dejan
collection	CERN
description	Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
id	cern-2780362
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2021
record_format	invenio
spelling	cern-27803622021-09-07T19:17:04Zdoi:10.1051/epjconf/202125102067http://cds.cern.ch/record/2780362engGolubovic, DejanRocha, RicardoTraining and Serving ML workloads with Kubeflow at CERNComputing and ComputersMachine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.oai:cds.cern.ch:27803622021
spellingShingle	Computing and Computers Golubovic, Dejan Rocha, Ricardo Training and Serving ML workloads with Kubeflow at CERN
title	Training and Serving ML workloads with Kubeflow at CERN
title_full	Training and Serving ML workloads with Kubeflow at CERN
title_fullStr	Training and Serving ML workloads with Kubeflow at CERN
title_full_unstemmed	Training and Serving ML workloads with Kubeflow at CERN
title_short	Training and Serving ML workloads with Kubeflow at CERN
title_sort	training and serving ml workloads with kubeflow at cern
topic	Computing and Computers
url	https://dx.doi.org/10.1051/epjconf/202125102067 http://cds.cern.ch/record/2780362
work_keys_str_mv	AT golubovicdejan trainingandservingmlworkloadswithkubeflowatcern AT rocharicardo trainingandservingmlworkloadswithkubeflowatcern

Training and Serving ML workloads with Kubeflow at CERN

Ejemplares similares