Cargando…

Machine Learning Based Datacenter Monitoring Framework

Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunti...

Descripción completa

Detalles Bibliográficos
Autor principal: Sidhu, Ravneet Singh
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2318388
_version_ 1780958433333215232
author Sidhu, Ravneet Singh
author_facet Sidhu, Ravneet Singh
author_sort Sidhu, Ravneet Singh
collection CERN
description Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software. In this work we focus on the detection and prediction of data center anomalies by using a machine learning based approach. We present the idea of using monitoring data from multiple monitoring solutions and formulating a single high dimensional vector based model, which further is fed into a machine-learning algorithm. In this approach we will find patterns and associations among the different attributes of a data center, which remain hidden in the single system context. The use of disparate monitoring systems in conjunction will give a holistic view of the cluster with an increase in the probability of finding critical issues before they occur as well as alert the system administrator.
id oai-inspirehep.net-1642953
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2018
record_format invenio
spelling oai-inspirehep.net-16429532019-09-30T06:29:59Zhttp://cds.cern.ch/record/2318388engSidhu, Ravneet SinghMachine Learning Based Datacenter Monitoring FrameworkComputing and ComputersMonitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software. In this work we focus on the detection and prediction of data center anomalies by using a machine learning based approach. We present the idea of using monitoring data from multiple monitoring solutions and formulating a single high dimensional vector based model, which further is fed into a machine-learning algorithm. In this approach we will find patterns and associations among the different attributes of a data center, which remain hidden in the single system context. The use of disparate monitoring systems in conjunction will give a holistic view of the cluster with an increase in the probability of finding critical issues before they occur as well as alert the system administrator.CERN-THESIS-2016-377oai:inspirehep.net:16429532018-05-16T04:13:17Z
spellingShingle Computing and Computers
Sidhu, Ravneet Singh
Machine Learning Based Datacenter Monitoring Framework
title Machine Learning Based Datacenter Monitoring Framework
title_full Machine Learning Based Datacenter Monitoring Framework
title_fullStr Machine Learning Based Datacenter Monitoring Framework
title_full_unstemmed Machine Learning Based Datacenter Monitoring Framework
title_short Machine Learning Based Datacenter Monitoring Framework
title_sort machine learning based datacenter monitoring framework
topic Computing and Computers
url http://cds.cern.ch/record/2318388
work_keys_str_mv AT sidhuravneetsingh machinelearningbaseddatacentermonitoringframework