Cargando…
Fault-Detection Managers: More May Not Be the Merrier
A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortuna...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896543/ https://www.ncbi.nlm.nih.gov/pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2 |
_version_ | 1783653565162586112 |
---|---|
author | Zamani, Ghazal Das, Olivia |
author_facet | Zamani, Ghazal Das, Olivia |
author_sort | Zamani, Ghazal |
collection | PubMed |
description | A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortunately makes the recovery of applications dependent on the fault management system itself. This work introduces two novel equations to meet the performance objectives of applications. To this end, we first create an equation that estimates the maximum number of jobs to be handled by an application instance for meeting a given performance objective. This formula is then used by admission control mechanism to restrict the number of jobs (targeted for operational application instances) to be allowed to enter the system. Next, we create a second equation that computes the response time distribution of an application. Thereafter, we develop a simulation model that predicts the impact of the failure of four sample fault management architectures on application’s performance. Exploiting our equations, we compare the architectures in terms of three distinct ways of handling affected jobs when application instances fail—allow job loss; retry jobs resulting in overload; employ admission control to mitigate the overload. Our simulation results show that boosting the number of managers may not always be beneficial; rather, it could possibly be the interconnection topology (i.e. the layout of interconnects linking the architectural components) of the management architecture, together with the model parameter values that may sometimes have a bigger role to play in the application’s performance. |
format | Online Article Text |
id | pubmed-7896543 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-78965432021-02-22 Fault-Detection Managers: More May Not Be the Merrier Zamani, Ghazal Das, Olivia J Grid Comput Article A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortunately makes the recovery of applications dependent on the fault management system itself. This work introduces two novel equations to meet the performance objectives of applications. To this end, we first create an equation that estimates the maximum number of jobs to be handled by an application instance for meeting a given performance objective. This formula is then used by admission control mechanism to restrict the number of jobs (targeted for operational application instances) to be allowed to enter the system. Next, we create a second equation that computes the response time distribution of an application. Thereafter, we develop a simulation model that predicts the impact of the failure of four sample fault management architectures on application’s performance. Exploiting our equations, we compare the architectures in terms of three distinct ways of handling affected jobs when application instances fail—allow job loss; retry jobs resulting in overload; employ admission control to mitigate the overload. Our simulation results show that boosting the number of managers may not always be beneficial; rather, it could possibly be the interconnection topology (i.e. the layout of interconnects linking the architectural components) of the management architecture, together with the model parameter values that may sometimes have a bigger role to play in the application’s performance. Springer Netherlands 2021-02-20 2021 /pmc/articles/PMC7896543/ /pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2 Text en © The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Zamani, Ghazal Das, Olivia Fault-Detection Managers: More May Not Be the Merrier |
title | Fault-Detection Managers: More May Not Be the Merrier |
title_full | Fault-Detection Managers: More May Not Be the Merrier |
title_fullStr | Fault-Detection Managers: More May Not Be the Merrier |
title_full_unstemmed | Fault-Detection Managers: More May Not Be the Merrier |
title_short | Fault-Detection Managers: More May Not Be the Merrier |
title_sort | fault-detection managers: more may not be the merrier |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896543/ https://www.ncbi.nlm.nih.gov/pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2 |
work_keys_str_mv | AT zamanighazal faultdetectionmanagersmoremaynotbethemerrier AT dasolivia faultdetectionmanagersmoremaynotbethemerrier |