Cargando…
Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existin...
Autores principales: | , , , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1051/epjconf/202024509011 http://cds.cern.ch/record/2753439 |
_version_ | 1780969444543037440 |
---|---|
author | Hildreth, Michael Hurtado Anampa, Kenyi Paolo Kankel, Cody Hampton, Scott Brenner, Paul Johnson, Irena Simko, Tibor |
author_facet | Hildreth, Michael Hurtado Anampa, Kenyi Paolo Kankel, Cody Hampton, Scott Brenner, Paul Johnson, Irena Simko, Tibor |
author_sort | Hildreth, Michael |
collection | CERN |
description | The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level. |
id | oai-inspirehep.net-1832147 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2020 |
record_format | invenio |
spelling | oai-inspirehep.net-18321472021-03-04T20:31:40Zdoi:10.1051/epjconf/202024509011http://cds.cern.ch/record/2753439engHildreth, MichaelHurtado Anampa, Kenyi PaoloKankel, CodyHampton, ScottBrenner, PaulJohnson, IrenaSimko, TiborLarge-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)Computing and ComputersThe NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.oai:inspirehep.net:18321472020 |
spellingShingle | Computing and Computers Hildreth, Michael Hurtado Anampa, Kenyi Paolo Kankel, Cody Hampton, Scott Brenner, Paul Johnson, Irena Simko, Tibor Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title | Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title_full | Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title_fullStr | Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title_full_unstemmed | Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title_short | Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) |
title_sort | large-scale hpc deployment of scalable cyberinfrastructure for artificial intelligence and likelihood free inference (scailfin) |
topic | Computing and Computers |
url | https://dx.doi.org/10.1051/epjconf/202024509011 http://cds.cern.ch/record/2753439 |
work_keys_str_mv | AT hildrethmichael largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT hurtadoanampakenyipaolo largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT kankelcody largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT hamptonscott largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT brennerpaul largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT johnsonirena largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin AT simkotibor largescalehpcdeploymentofscalablecyberinfrastructureforartificialintelligenceandlikelihoodfreeinferencescailfin |