Cargando…

LHCb Online Log Analysis and Maintenance System

History has shown, many times computer logs are the only information an administrator may have for an incident, which could be caused either by a malfunction or an attack. Due to the huge amount of logs that are produced from large-scale IT infrastructures, such as LHCb Online, critical information...

Descripción completa

Detalles Bibliográficos
Autores principales: Garnier, J-C, Brarda, L, Neufeld, N, Nikolaidis, F
Lenguaje:eng
Publicado: 2011
Materias:
Acceso en línea:http://cds.cern.ch/record/1565102
_version_ 1780930882142470144
author Garnier, J-C
Brarda, L
Neufeld, N
Nikolaidis, F
author_facet Garnier, J-C
Brarda, L
Neufeld, N
Nikolaidis, F
author_sort Garnier, J-C
collection CERN
description History has shown, many times computer logs are the only information an administrator may have for an incident, which could be caused either by a malfunction or an attack. Due to the huge amount of logs that are produced from large-scale IT infrastructures, such as LHCb Online, critical information may be overlooked or simply be drowned in a sea of other messages. This clearly demonstrates the need for an automatic system for long-term maintenance and real time analysis of the logs. We have constructed a low cost, fault tolerant centralized logging system which is able to do in-depth analysis and cross-correlation of every log. This system is capable of handling O(10000) different log sources and numerous formats, while trying to keep the overhead as low as possible. It provides log gathering and management, Offline analysis and online analysis. We call Offline analysis the procedure of analyzing old logs for critical information, while Online analysis refer to the procedure of early alerting and reacting. The system is extensible and cooperates well with other applications such as Intrusion Detection / Prevention Systems. This paper presents the LHCb Online topology, problems we had to overcome and our solutions. Special emphasis is given to log analysis and how we use it for monitoring and how we can have uninterrupted access to the logs. We provide performance plots, code modification in well-known log tools and our experience from trying various storage strategies.
id cern-1565102
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2011
record_format invenio
spelling cern-15651022022-08-17T13:24:44Zhttp://cds.cern.ch/record/1565102engGarnier, J-CBrarda, LNeufeld, NNikolaidis, FLHCb Online Log Analysis and Maintenance SystemComputing and ComputersHistory has shown, many times computer logs are the only information an administrator may have for an incident, which could be caused either by a malfunction or an attack. Due to the huge amount of logs that are produced from large-scale IT infrastructures, such as LHCb Online, critical information may be overlooked or simply be drowned in a sea of other messages. This clearly demonstrates the need for an automatic system for long-term maintenance and real time analysis of the logs. We have constructed a low cost, fault tolerant centralized logging system which is able to do in-depth analysis and cross-correlation of every log. This system is capable of handling O(10000) different log sources and numerous formats, while trying to keep the overhead as low as possible. It provides log gathering and management, Offline analysis and online analysis. We call Offline analysis the procedure of analyzing old logs for critical information, while Online analysis refer to the procedure of early alerting and reacting. The system is extensible and cooperates well with other applications such as Intrusion Detection / Prevention Systems. This paper presents the LHCb Online topology, problems we had to overcome and our solutions. Special emphasis is given to log analysis and how we use it for monitoring and how we can have uninterrupted access to the logs. We provide performance plots, code modification in well-known log tools and our experience from trying various storage strategies.oai:cds.cern.ch:15651022011
spellingShingle Computing and Computers
Garnier, J-C
Brarda, L
Neufeld, N
Nikolaidis, F
LHCb Online Log Analysis and Maintenance System
title LHCb Online Log Analysis and Maintenance System
title_full LHCb Online Log Analysis and Maintenance System
title_fullStr LHCb Online Log Analysis and Maintenance System
title_full_unstemmed LHCb Online Log Analysis and Maintenance System
title_short LHCb Online Log Analysis and Maintenance System
title_sort lhcb online log analysis and maintenance system
topic Computing and Computers
url http://cds.cern.ch/record/1565102
work_keys_str_mv AT garnierjc lhcbonlineloganalysisandmaintenancesystem
AT brardal lhcbonlineloganalysisandmaintenancesystem
AT neufeldn lhcbonlineloganalysisandmaintenancesystem
AT nikolaidisf lhcbonlineloganalysisandmaintenancesystem