Cargando…

Machine Learning Based Datacenter Monitoring Framework

Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunti...

Descripción completa

Detalles Bibliográficos
Autor principal: Sidhu, Ravneet Singh
Lenguaje:eng
Publicado: 2018
Materias:
Acceso en línea:http://cds.cern.ch/record/2318388
Descripción
Sumario:Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software. In this work we focus on the detection and prediction of data center anomalies by using a machine learning based approach. We present the idea of using monitoring data from multiple monitoring solutions and formulating a single high dimensional vector based model, which further is fed into a machine-learning algorithm. In this approach we will find patterns and associations among the different attributes of a data center, which remain hidden in the single system context. The use of disparate monitoring systems in conjunction will give a holistic view of the cluster with an increase in the probability of finding critical issues before they occur as well as alert the system administrator.