Cargando…

Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation

As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in p...

Descripción completa

Detalles Bibliográficos
Autores principales: Sfiligoi, Igor, Würthwein, Frank, Riedel, Benedikt, Schultz, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295355/
http://dx.doi.org/10.1007/978-3-030-50743-5_2
_version_ 1783546635234574336
author Sfiligoi, Igor
Würthwein, Frank
Riedel, Benedikt
Schultz, David
author_facet Sfiligoi, Igor
Würthwein, Frank
Riedel, Benedikt
Schultz, David
author_sort Sfiligoi, Igor
collection PubMed
description As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube’s photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise.
format Online
Article
Text
id pubmed-7295355
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72953552020-06-16 Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation Sfiligoi, Igor Würthwein, Frank Riedel, Benedikt Schultz, David High Performance Computing Article As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube’s photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise. 2020-05-22 /pmc/articles/PMC7295355/ http://dx.doi.org/10.1007/978-3-030-50743-5_2 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Sfiligoi, Igor
Würthwein, Frank
Riedel, Benedikt
Schultz, David
Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title_full Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title_fullStr Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title_full_unstemmed Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title_short Running a Pre-exascale, Geographically Distributed, Multi-cloud Scientific Simulation
title_sort running a pre-exascale, geographically distributed, multi-cloud scientific simulation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295355/
http://dx.doi.org/10.1007/978-3-030-50743-5_2
work_keys_str_mv AT sfiligoiigor runningapreexascalegeographicallydistributedmulticloudscientificsimulation
AT wurthweinfrank runningapreexascalegeographicallydistributedmulticloudscientificsimulation
AT riedelbenedikt runningapreexascalegeographicallydistributedmulticloudscientificsimulation
AT schultzdavid runningapreexascalegeographicallydistributedmulticloudscientificsimulation