Cargando…
Analysis of Checkpoint I/O Behavior
Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302326/ http://dx.doi.org/10.1007/978-3-030-50371-0_14 |
_version_ | 1783547822245675008 |
---|---|
author | León, Betzabeth Gomez-Sanchez, Pilar Franco, Daniel Rexachs, Dolores Luque, Emilio |
author_facet | León, Betzabeth Gomez-Sanchez, Pilar Franco, Daniel Rexachs, Dolores Luque, Emilio |
author_sort | León, Betzabeth |
collection | PubMed |
description | Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we must add the impact of the checkpoint. The checkpoint saves information about the application and the system in order to be able to restore the application, if necessary, in stable storage. The checkpoint can be considered as an intensive I/O application, so its storage need can have a great impact on the application. Therefore, in this paper, the analysis of the checkpoint’s I/O behavior is presented. The number of checkpoints to be performed in an application is often related to the maximum overhead that you want to introduce in the application. If we know the maximum overload the user wants to pay for and the overhead that a checkpoint introduces, we can calculate the number of checkpoints to be performed. This overhead depends significantly on the I/O operations. The PIOM-PX tool was used to analyze the spatial and temporal I/O patterns of the checkpoint. Based on this analysis, a model was designed to predict their behavior. This information is used to calculate the number of checkpoints to be performed in an application given a maximum overhead predefined by the user. This will allow us to understand what happens when a checkpoint is created in an HPC system, in order to make decisions that adapt to the user’s requirements. |
format | Online Article Text |
id | pubmed-7302326 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73023262020-06-18 Analysis of Checkpoint I/O Behavior León, Betzabeth Gomez-Sanchez, Pilar Franco, Daniel Rexachs, Dolores Luque, Emilio Computational Science – ICCS 2020 Article Nowadays, checkpoints have gained some relevance, given the increasing complexity of scientific applications for the use of many resources over a long period of time. Thus, in fault tolerance strategies, in addition to taking into account the impact that the application itself has on HPC systems, we must add the impact of the checkpoint. The checkpoint saves information about the application and the system in order to be able to restore the application, if necessary, in stable storage. The checkpoint can be considered as an intensive I/O application, so its storage need can have a great impact on the application. Therefore, in this paper, the analysis of the checkpoint’s I/O behavior is presented. The number of checkpoints to be performed in an application is often related to the maximum overhead that you want to introduce in the application. If we know the maximum overload the user wants to pay for and the overhead that a checkpoint introduces, we can calculate the number of checkpoints to be performed. This overhead depends significantly on the I/O operations. The PIOM-PX tool was used to analyze the spatial and temporal I/O patterns of the checkpoint. Based on this analysis, a model was designed to predict their behavior. This information is used to calculate the number of checkpoints to be performed in an application given a maximum overhead predefined by the user. This will allow us to understand what happens when a checkpoint is created in an HPC system, in order to make decisions that adapt to the user’s requirements. 2020-05-26 /pmc/articles/PMC7302326/ http://dx.doi.org/10.1007/978-3-030-50371-0_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article León, Betzabeth Gomez-Sanchez, Pilar Franco, Daniel Rexachs, Dolores Luque, Emilio Analysis of Checkpoint I/O Behavior |
title | Analysis of Checkpoint I/O Behavior |
title_full | Analysis of Checkpoint I/O Behavior |
title_fullStr | Analysis of Checkpoint I/O Behavior |
title_full_unstemmed | Analysis of Checkpoint I/O Behavior |
title_short | Analysis of Checkpoint I/O Behavior |
title_sort | analysis of checkpoint i/o behavior |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302326/ http://dx.doi.org/10.1007/978-3-030-50371-0_14 |
work_keys_str_mv | AT leonbetzabeth analysisofcheckpointiobehavior AT gomezsanchezpilar analysisofcheckpointiobehavior AT francodaniel analysisofcheckpointiobehavior AT rexachsdolores analysisofcheckpointiobehavior AT luqueemilio analysisofcheckpointiobehavior |