Cargando…

Association rule mining on grid monitoring data to detect error sources

Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is r...

Descripción completa

Detalles Bibliográficos
Autores principales: Maier, G, Schiffers, M, Kranzlmueller, D, Gaidioz, B
Lenguaje:eng
Publicado: 2010
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/219/7/072041
http://cds.cern.ch/record/1270545
_version_ 1780920197690949632
author Maier, G
Schiffers, M
Kranzlmueller, D
Gaidioz, B
author_facet Maier, G
Schiffers, M
Kranzlmueller, D
Gaidioz, B
author_sort Maier, G
collection CERN
description Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability
id cern-1270545
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2010
record_format invenio
spelling cern-12705452022-08-17T13:24:58Zdoi:10.1088/1742-6596/219/7/072041http://cds.cern.ch/record/1270545engMaier, GSchiffers, MKranzlmueller, DGaidioz, BAssociation rule mining on grid monitoring data to detect error sourcesComputing and ComputersError handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliabilityoai:cds.cern.ch:12705452010
spellingShingle Computing and Computers
Maier, G
Schiffers, M
Kranzlmueller, D
Gaidioz, B
Association rule mining on grid monitoring data to detect error sources
title Association rule mining on grid monitoring data to detect error sources
title_full Association rule mining on grid monitoring data to detect error sources
title_fullStr Association rule mining on grid monitoring data to detect error sources
title_full_unstemmed Association rule mining on grid monitoring data to detect error sources
title_short Association rule mining on grid monitoring data to detect error sources
title_sort association rule mining on grid monitoring data to detect error sources
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/219/7/072041
http://cds.cern.ch/record/1270545
work_keys_str_mv AT maierg associationruleminingongridmonitoringdatatodetecterrorsources
AT schiffersm associationruleminingongridmonitoringdatatodetecterrorsources
AT kranzlmuellerd associationruleminingongridmonitoringdatatodetecterrorsources
AT gaidiozb associationruleminingongridmonitoringdatatodetecterrorsources