Cargando…

Fault-Detection Managers: More May Not Be the Merrier

A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortuna...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zamani, Ghazal, Das, Olivia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896543/ https://www.ncbi.nlm.nih.gov/pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2

_version_	1783653565162586112
author	Zamani, Ghazal Das, Olivia
author_facet	Zamani, Ghazal Das, Olivia
author_sort	Zamani, Ghazal
collection	PubMed
description	A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortunately makes the recovery of applications dependent on the fault management system itself. This work introduces two novel equations to meet the performance objectives of applications. To this end, we first create an equation that estimates the maximum number of jobs to be handled by an application instance for meeting a given performance objective. This formula is then used by admission control mechanism to restrict the number of jobs (targeted for operational application instances) to be allowed to enter the system. Next, we create a second equation that computes the response time distribution of an application. Thereafter, we develop a simulation model that predicts the impact of the failure of four sample fault management architectures on application’s performance. Exploiting our equations, we compare the architectures in terms of three distinct ways of handling affected jobs when application instances fail—allow job loss; retry jobs resulting in overload; employ admission control to mitigate the overload. Our simulation results show that boosting the number of managers may not always be beneficial; rather, it could possibly be the interconnection topology (i.e. the layout of interconnects linking the architectural components) of the management architecture, together with the model parameter values that may sometimes have a bigger role to play in the application’s performance.
format	Online Article Text
id	pubmed-7896543
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-78965432021-02-22 Fault-Detection Managers: More May Not Be the Merrier Zamani, Ghazal Das, Olivia J Grid Comput Article A fault management system contains managers that detect faults as well as initiate recovery actions. Such management systems often come in an architecture that is not only a distributed one but also decoupled from the applications. Although an arrangement like this promotes scalability, it unfortunately makes the recovery of applications dependent on the fault management system itself. This work introduces two novel equations to meet the performance objectives of applications. To this end, we first create an equation that estimates the maximum number of jobs to be handled by an application instance for meeting a given performance objective. This formula is then used by admission control mechanism to restrict the number of jobs (targeted for operational application instances) to be allowed to enter the system. Next, we create a second equation that computes the response time distribution of an application. Thereafter, we develop a simulation model that predicts the impact of the failure of four sample fault management architectures on application’s performance. Exploiting our equations, we compare the architectures in terms of three distinct ways of handling affected jobs when application instances fail—allow job loss; retry jobs resulting in overload; employ admission control to mitigate the overload. Our simulation results show that boosting the number of managers may not always be beneficial; rather, it could possibly be the interconnection topology (i.e. the layout of interconnects linking the architectural components) of the management architecture, together with the model parameter values that may sometimes have a bigger role to play in the application’s performance. Springer Netherlands 2021-02-20 2021 /pmc/articles/PMC7896543/ /pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2 Text en © The Author(s), under exclusive licence to Springer Nature B.V. part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Zamani, Ghazal Das, Olivia Fault-Detection Managers: More May Not Be the Merrier
title	Fault-Detection Managers: More May Not Be the Merrier
title_full	Fault-Detection Managers: More May Not Be the Merrier
title_fullStr	Fault-Detection Managers: More May Not Be the Merrier
title_full_unstemmed	Fault-Detection Managers: More May Not Be the Merrier
title_short	Fault-Detection Managers: More May Not Be the Merrier
title_sort	fault-detection managers: more may not be the merrier
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896543/ https://www.ncbi.nlm.nih.gov/pubmed/33642963 http://dx.doi.org/10.1007/s10723-021-09546-2
work_keys_str_mv	AT zamanighazal faultdetectionmanagersmoremaynotbethemerrier AT dasolivia faultdetectionmanagersmoremaynotbethemerrier

Fault-Detection Managers: More May Not Be the Merrier

Ejemplares similares