Cargando…
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7435729/ https://www.ncbi.nlm.nih.gov/pubmed/32718093 http://dx.doi.org/10.3390/s20154111 |
_version_ | 1783572389921030144 |
---|---|
author | Halawa, Mohamed S. Díaz Redondo, Rebeca P. Fernández Vilas, Ana |
author_facet | Halawa, Mohamed S. Díaz Redondo, Rebeca P. Fernández Vilas, Ana |
author_sort | Halawa, Mohamed S. |
collection | PubMed |
description | Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center. |
format | Online Article Text |
id | pubmed-7435729 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-74357292020-08-25 Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers Halawa, Mohamed S. Díaz Redondo, Rebeca P. Fernández Vilas, Ana Sensors (Basel) Article Performance analysis is an essential task in high-performance computing (HPC) systems, and it is applied for different purposes, such as anomaly detection, optimal resource allocation, and budget planning. HPC monitoring tasks generate a huge number of key performance indicators (KPIs) to supervise the status of the jobs running in these systems. KPIs give data about CPU usage, memory usage, network (interface) traffic, or other sensors that monitor the hardware. Analyzing this data, it is possible to obtain insightful information about running jobs, such as their characteristics, performance, and failures. The main contribution in this paper was to identify which metric/s (KPIs) is/are the most appropriate to identify/classify different types of jobs according to their behavior in the HPC system. With this aim, we had applied different clustering techniques (partition and hierarchical clustering algorithms) using a real dataset from the Galician computation center (CESGA). We concluded that (i) those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and (ii) hierarchical clustering algorithms were the most suitable for this task. Our approach was validated using a different real dataset from the same HPC center. MDPI 2020-07-23 /pmc/articles/PMC7435729/ /pubmed/32718093 http://dx.doi.org/10.3390/s20154111 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Halawa, Mohamed S. Díaz Redondo, Rebeca P. Fernández Vilas, Ana Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title | Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title_full | Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title_fullStr | Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title_full_unstemmed | Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title_short | Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers |
title_sort | unsupervised kpis-based clustering of jobs in hpc data centers |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7435729/ https://www.ncbi.nlm.nih.gov/pubmed/32718093 http://dx.doi.org/10.3390/s20154111 |
work_keys_str_mv | AT halawamohameds unsupervisedkpisbasedclusteringofjobsinhpcdatacenters AT diazredondorebecap unsupervisedkpisbasedclusteringofjobsinhpcdatacenters AT fernandezvilasana unsupervisedkpisbasedclusteringofjobsinhpcdatacenters |