Cargando…

Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers

Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise...

Descripción completa

Detalles Bibliográficos
Autores principales: Halawa, Mohamed S., Díaz Redondo, Rebeca P., Fernández Vilas, Ana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7435729/
https://www.ncbi.nlm.nih.gov/pubmed/32718093
http://dx.doi.org/10.3390/s20154111
_version_ 1783572389921030144
author Halawa, Mohamed S.
Díaz Redondo, Rebeca P.
Fernández Vilas, Ana
author_facet Halawa, Mohamed S.
Díaz Redondo, Rebeca P.
Fernández Vilas, Ana
author_sort Halawa, Mohamed S.
collection PubMed
description Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center.
format Online
Article
Text
id pubmed-7435729
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-74357292020-08-25 Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers Halawa, Mohamed S. Díaz Redondo, Rebeca P. Fernández Vilas, Ana Sensors (Basel) Article Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center. MDPI 2020-07-23 /pmc/articles/PMC7435729/ /pubmed/32718093 http://dx.doi.org/10.3390/s20154111 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Halawa, Mohamed S.
Díaz Redondo, Rebeca P.
Fernández Vilas, Ana
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title_full Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title_fullStr Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title_full_unstemmed Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title_short Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
title_sort unsupervised kpis-based clustering of jobs in hpc data centers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7435729/
https://www.ncbi.nlm.nih.gov/pubmed/32718093
http://dx.doi.org/10.3390/s20154111
work_keys_str_mv AT halawamohameds unsupervisedkpisbasedclusteringofjobsinhpcdatacenters
AT diazredondorebecap unsupervisedkpisbasedclusteringofjobsinhpcdatacenters
AT fernandezvilasana unsupervisedkpisbasedclusteringofjobsinhpcdatacenters