Cargando…

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existin...

Descripción completa

Detalles Bibliográficos
Autores principales: Hildreth, Michael, Hurtado Anampa, Kenyi Paolo, Kankel, Cody, Hampton, Scott, Brenner, Paul, Johnson, Irena, Simko, Tibor
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/202024509011
http://cds.cern.ch/record/2753439
_version_ 1780969444543037440
author Hildreth, Michael
Hurtado Anampa, Kenyi Paolo
Kankel, Cody
Hampton, Scott
Brenner, Paul
Johnson, Irena
Simko, Tibor
author_facet Hildreth, Michael
Hurtado Anampa, Kenyi Paolo
Kankel, Cody
Hampton, Scott
Brenner, Paul
Johnson, Irena
Simko, Tibor
author_sort Hildreth, Michael
collection CERN
description The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.
id oai-inspirehep.net-1832147
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling oai-inspirehep.net-18321472021-03-04T20:31:40Zdoi:10.1051/epjconf/202024509011http://cds.cern.ch/record/2753439engHildreth, MichaelHurtado Anampa, Kenyi PaoloKankel, CodyHampton, ScottBrenner, PaulJohnson, IrenaSimko, TiborLarge-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)Computing and ComputersThe NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.oai:inspirehep.net:18321472020
spellingShingle Computing and Computers
Hildreth, Michael
Hurtado Anampa, Kenyi Paolo
Kankel, Cody
Hampton, Scott
Brenner, Paul
Johnson, Irena
Simko, Tibor
Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title_full Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title_fullStr Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title_full_unstemmed Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title_short Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
title_sort large-scale hpc deployment of scalable cyberinfrastructure for artificial intelligence and likelihood free inference (scailfin)
topic Computing and Computers
url https://dx.doi.org/10.1051/epjconf/202024509011
http://cds.cern.ch/record/2753439
work_keys_str_mv AT hildrethmichael largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT hurtadoanampakenyipaolo largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT kankelcody largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT hamptonscott largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT brennerpaul largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT johnsonirena largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin
AT simkotibor largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin