Cargando…

Application of multivariate time-series model for high performance computing (HPC) fault prediction

Aiming at the high reliability demand of increasingly large and complex supercomputing systems, this paper proposes a multidimensional fusion CBA-net (CNN-BiLSTAM-Attention) fault prediction model based on HDBSCAN clustering preprocessing classification data, which can effectively extract and learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Pei, Xiangdong, Yuan, Min, Mao, Guo, Pang, Zhengbin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10581458/
https://www.ncbi.nlm.nih.gov/pubmed/37847694
http://dx.doi.org/10.1371/journal.pone.0281519
_version_ 1785122140493709312
author Pei, Xiangdong
Yuan, Min
Mao, Guo
Pang, Zhengbin
author_facet Pei, Xiangdong
Yuan, Min
Mao, Guo
Pang, Zhengbin
author_sort Pei, Xiangdong
collection PubMed
description Aiming at the high reliability demand of increasingly large and complex supercomputing systems, this paper proposes a multidimensional fusion CBA-net (CNN-BiLSTAM-Attention) fault prediction model based on HDBSCAN clustering preprocessing classification data, which can effectively extract and learn the spatial and temporal features in the predecessor fault log. The model can effectively extract and learn the spatial and temporal features from the predecessor fault logs, and has the advantages of high sensitivity to time series features and sufficient extraction of local features, etc. The RMSE of the model for fault occurrence time prediction is 0.031, and the prediction accuracy of node location for fault occurrence is 93% on average, as demonstrated by experiments. The model can achieve fast convergence and improve the fine-grained and accurate fault prediction of large supercomputers.
format Online
Article
Text
id pubmed-10581458
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105814582023-10-18 Application of multivariate time-series model for high performance computing (HPC) fault prediction Pei, Xiangdong Yuan, Min Mao, Guo Pang, Zhengbin PLoS One Research Article Aiming at the high reliability demand of increasingly large and complex supercomputing systems, this paper proposes a multidimensional fusion CBA-net (CNN-BiLSTAM-Attention) fault prediction model based on HDBSCAN clustering preprocessing classification data, which can effectively extract and learn the spatial and temporal features in the predecessor fault log. The model can effectively extract and learn the spatial and temporal features from the predecessor fault logs, and has the advantages of high sensitivity to time series features and sufficient extraction of local features, etc. The RMSE of the model for fault occurrence time prediction is 0.031, and the prediction accuracy of node location for fault occurrence is 93% on average, as demonstrated by experiments. The model can achieve fast convergence and improve the fine-grained and accurate fault prediction of large supercomputers. Public Library of Science 2023-10-17 /pmc/articles/PMC10581458/ /pubmed/37847694 http://dx.doi.org/10.1371/journal.pone.0281519 Text en © 2023 Pei et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pei, Xiangdong
Yuan, Min
Mao, Guo
Pang, Zhengbin
Application of multivariate time-series model for high performance computing (HPC) fault prediction
title Application of multivariate time-series model for high performance computing (HPC) fault prediction
title_full Application of multivariate time-series model for high performance computing (HPC) fault prediction
title_fullStr Application of multivariate time-series model for high performance computing (HPC) fault prediction
title_full_unstemmed Application of multivariate time-series model for high performance computing (HPC) fault prediction
title_short Application of multivariate time-series model for high performance computing (HPC) fault prediction
title_sort application of multivariate time-series model for high performance computing (hpc) fault prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10581458/
https://www.ncbi.nlm.nih.gov/pubmed/37847694
http://dx.doi.org/10.1371/journal.pone.0281519
work_keys_str_mv AT peixiangdong applicationofmultivariatetimeseriesmodelforhighperformancecomputinghpcfaultprediction
AT yuanmin applicationofmultivariatetimeseriesmodelforhighperformancecomputinghpcfaultprediction
AT maoguo applicationofmultivariatetimeseriesmodelforhighperformancecomputinghpcfaultprediction
AT pangzhengbin applicationofmultivariatetimeseriesmodelforhighperformancecomputinghpcfaultprediction