Cargando…

High-Availability Computing Platform with Sensor Fault Resilience

Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Yen-Lin, Arizky, Shinta Nuraisya, Chen, Yu-Ren, Liang, Deron, Wang, Wei-Jen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7828599/
https://www.ncbi.nlm.nih.gov/pubmed/33451105
http://dx.doi.org/10.3390/s21020542
_version_ 1783641045731377152
author Lee, Yen-Lin
Arizky, Shinta Nuraisya
Chen, Yu-Ren
Liang, Deron
Wang, Wei-Jen
author_facet Lee, Yen-Lin
Arizky, Shinta Nuraisya
Chen, Yu-Ren
Liang, Deron
Wang, Wei-Jen
author_sort Lee, Yen-Lin
collection PubMed
description Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating.
format Online
Article
Text
id pubmed-7828599
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-78285992021-01-25 High-Availability Computing Platform with Sensor Fault Resilience Lee, Yen-Lin Arizky, Shinta Nuraisya Chen, Yu-Ren Liang, Deron Wang, Wei-Jen Sensors (Basel) Article Modern computing platforms usually use multiple sensors to report system information. In order to achieve high availability (HA) for the platform, the sensors can be used to efficiently detect system faults that make a cloud service not live. However, a sensor may fail and disable HA protection. In this case, human intervention is needed, either to change the original fault model or to fix the sensor fault. Therefore, this study proposes an HA mechanism that can continuously provide HA to a cloud system based on dynamic fault model reconstruction. We have implemented the proposed HA mechanism on a four-layer OpenStack cloud system and tested the performance of the proposed mechanism for all possible sets of sensor faults. For each fault model, we inject possible system faults and measure the average fault detection time. The experimental result shows that the proposed mechanism can accurately detect and recover an injected system fault with disabled sensors. In addition, the system fault detection time increases as the number of sensor faults increases, until the HA mechanism is degraded to a one-system-fault model, which is the worst case as the system layer heartbeating. MDPI 2021-01-13 /pmc/articles/PMC7828599/ /pubmed/33451105 http://dx.doi.org/10.3390/s21020542 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lee, Yen-Lin
Arizky, Shinta Nuraisya
Chen, Yu-Ren
Liang, Deron
Wang, Wei-Jen
High-Availability Computing Platform with Sensor Fault Resilience
title High-Availability Computing Platform with Sensor Fault Resilience
title_full High-Availability Computing Platform with Sensor Fault Resilience
title_fullStr High-Availability Computing Platform with Sensor Fault Resilience
title_full_unstemmed High-Availability Computing Platform with Sensor Fault Resilience
title_short High-Availability Computing Platform with Sensor Fault Resilience
title_sort high-availability computing platform with sensor fault resilience
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7828599/
https://www.ncbi.nlm.nih.gov/pubmed/33451105
http://dx.doi.org/10.3390/s21020542
work_keys_str_mv AT leeyenlin highavailabilitycomputingplatformwithsensorfaultresilience
AT arizkyshintanuraisya highavailabilitycomputingplatformwithsensorfaultresilience
AT chenyuren highavailabilitycomputingplatformwithsensorfaultresilience
AT liangderon highavailabilitycomputingplatformwithsensorfaultresilience
AT wangweijen highavailabilitycomputingplatformwithsensorfaultresilience