Cargando…

Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads

Long-running applications are often subject to failures. Once failures occur, it will lead to unacceptable system overheads. The checkpoint technology is used to reduce the losses in the event of a failure. For the two-level checkpoint recovery scheme used in the long-running tasks, it is unavoidabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Huixian, Pang, Liaojun, Wang, Zhangquan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128665/
https://www.ncbi.nlm.nih.gov/pubmed/25111048
http://dx.doi.org/10.1371/journal.pone.0104591
_version_ 1782330151934099456
author Li, Huixian
Pang, Liaojun
Wang, Zhangquan
author_facet Li, Huixian
Pang, Liaojun
Wang, Zhangquan
author_sort Li, Huixian
collection PubMed
description Long-running applications are often subject to failures. Once failures occur, it will lead to unacceptable system overheads. The checkpoint technology is used to reduce the losses in the event of a failure. For the two-level checkpoint recovery scheme used in the long-running tasks, it is unavoidable for the system to periodically transfer huge memory context to a remote stable storage. Therefore, the overheads of setting checkpoints and the re-computing time become a critical issue which directly impacts the system total overheads. Motivated by these concerns, this paper presents a new model by introducing i-checkpoints into the existing two-level checkpoint recovery scheme to deal with the more probable failures with the smaller cost and the faster speed. The proposed scheme is independent of the specific failure distribution type and can be applied to different failure distribution types. We respectively make analyses between the two-level incremental and two-level checkpoint recovery schemes with the Weibull distribution and exponential distribution, both of which fit with the actual failure distribution best. The comparison results show that the total overheads of setting checkpoints, the total re-computing time and the system total overheads in the two-level incremental checkpoint recovery scheme are all significantly smaller than those in the two-level checkpoint recovery scheme. At last, limitations of our study are discussed, and at the same time, open questions and possible future work are given.
format Online
Article
Text
id pubmed-4128665
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41286652014-08-12 Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads Li, Huixian Pang, Liaojun Wang, Zhangquan PLoS One Research Article Long-running applications are often subject to failures. Once failures occur, it will lead to unacceptable system overheads. The checkpoint technology is used to reduce the losses in the event of a failure. For the two-level checkpoint recovery scheme used in the long-running tasks, it is unavoidable for the system to periodically transfer huge memory context to a remote stable storage. Therefore, the overheads of setting checkpoints and the re-computing time become a critical issue which directly impacts the system total overheads. Motivated by these concerns, this paper presents a new model by introducing i-checkpoints into the existing two-level checkpoint recovery scheme to deal with the more probable failures with the smaller cost and the faster speed. The proposed scheme is independent of the specific failure distribution type and can be applied to different failure distribution types. We respectively make analyses between the two-level incremental and two-level checkpoint recovery schemes with the Weibull distribution and exponential distribution, both of which fit with the actual failure distribution best. The comparison results show that the total overheads of setting checkpoints, the total re-computing time and the system total overheads in the two-level incremental checkpoint recovery scheme are all significantly smaller than those in the two-level checkpoint recovery scheme. At last, limitations of our study are discussed, and at the same time, open questions and possible future work are given. Public Library of Science 2014-08-11 /pmc/articles/PMC4128665/ /pubmed/25111048 http://dx.doi.org/10.1371/journal.pone.0104591 Text en © 2014 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Huixian
Pang, Liaojun
Wang, Zhangquan
Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title_full Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title_fullStr Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title_full_unstemmed Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title_short Two-Level Incremental Checkpoint Recovery Scheme for Reducing System Total Overheads
title_sort two-level incremental checkpoint recovery scheme for reducing system total overheads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128665/
https://www.ncbi.nlm.nih.gov/pubmed/25111048
http://dx.doi.org/10.1371/journal.pone.0104591
work_keys_str_mv AT lihuixian twolevelincrementalcheckpointrecoveryschemeforreducingsystemtotaloverheads
AT pangliaojun twolevelincrementalcheckpointrecoveryschemeforreducingsystemtotaloverheads
AT wangzhangquan twolevelincrementalcheckpointrecoveryschemeforreducingsystemtotaloverheads