Cargando…

Analysis of Checkpoint I/O Behavior

Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we...

Descripción completa

Detalles Bibliográficos
Autores principales: León, Betzabeth, Gomez-Sanchez, Pilar, Franco, Daniel, Rexachs, Dolores, Luque, Emilio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302326/
http://dx.doi.org/10.1007/978-3-030-50371-0_14
_version_ 1783547822245675008
author León, Betzabeth
Gomez-Sanchez, Pilar
Franco, Daniel
Rexachs, Dolores
Luque, Emilio
author_facet León, Betzabeth
Gomez-Sanchez, Pilar
Franco, Daniel
Rexachs, Dolores
Luque, Emilio
author_sort León, Betzabeth
collection PubMed
description Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we must add the impact of the checkpoint. The checkpoint saves information about the application and the system in order to be able to restore the application, if necessary, in stable storage. The checkpoint can be considered as an intensive I/O application, so its storage need can have a great impact on the application. Therefore, in this paper, the analysis of the checkpoint’s I/O behavior is presented. The number of checkpoints to be performed in an application is often related to the maximum overhead that you want to introduce in the application. If we know the maximum overload the user wants to pay for and the overhead that a checkpoint introduces, we can calculate the number of checkpoints to be performed. This overhead depends significantly on the I/O operations. The PIOM-PX tool was used to analyze the spatial and temporal I/O patterns of the checkpoint. Based on this analysis, a model was designed to predict their behavior. This information is used to calculate the number of checkpoints to be performed in an application given a maximum overhead predefined by the user. This will allow us to understand what happens when a checkpoint is created in an HPC system, in order to make decisions that adapt to the user’s requirements.
format Online
Article
Text
id pubmed-7302326
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73023262020-06-18 Analysis of Checkpoint I/O Behavior León, Betzabeth Gomez-Sanchez, Pilar Franco, Daniel Rexachs, Dolores Luque, Emilio Computational Science – ICCS 2020 Article Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we must add the impact of the checkpoint. The checkpoint saves information about the application and the system in order to be able to restore the application, if necessary, in stable storage. The checkpoint can be considered as an intensive I/O application, so its storage need can have a great impact on the application. Therefore, in this paper, the analysis of the checkpoint’s I/O behavior is presented. The number of checkpoints to be performed in an application is often related to the maximum overhead that you want to introduce in the application. If we know the maximum overload the user wants to pay for and the overhead that a checkpoint introduces, we can calculate the number of checkpoints to be performed. This overhead depends significantly on the I/O operations. The PIOM-PX tool was used to analyze the spatial and temporal I/O patterns of the checkpoint. Based on this analysis, a model was designed to predict their behavior. This information is used to calculate the number of checkpoints to be performed in an application given a maximum overhead predefined by the user. This will allow us to understand what happens when a checkpoint is created in an HPC system, in order to make decisions that adapt to the user’s requirements. 2020-05-26 /pmc/articles/PMC7302326/ http://dx.doi.org/10.1007/978-3-030-50371-0_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
León, Betzabeth
Gomez-Sanchez, Pilar
Franco, Daniel
Rexachs, Dolores
Luque, Emilio
Analysis of Checkpoint I/O Behavior
title Analysis of Checkpoint I/O Behavior
title_full Analysis of Checkpoint I/O Behavior
title_fullStr Analysis of Checkpoint I/O Behavior
title_full_unstemmed Analysis of Checkpoint I/O Behavior
title_short Analysis of Checkpoint I/O Behavior
title_sort analysis of checkpoint i/o behavior
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302326/
http://dx.doi.org/10.1007/978-3-030-50371-0_14
work_keys_str_mv AT leonbetzabeth analysisofcheckpointiobehavior
AT gomezsanchezpilar analysisofcheckpointiobehavior
AT francodaniel analysisofcheckpointiobehavior
AT rexachsdolores analysisofcheckpointiobehavior
AT luqueemilio analysisofcheckpointiobehavior