Cargando…
Association rule mining on grid monitoring data to detect error sources
Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is r...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/219/7/072041 http://cds.cern.ch/record/1270545 |
_version_ | 1780920197690949632 |
---|---|
author | Maier, G Schiffers, M Kranzlmueller, D Gaidioz, B |
author_facet | Maier, G Schiffers, M Kranzlmueller, D Gaidioz, B |
author_sort | Maier, G |
collection | CERN |
description | Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliability |
id | cern-1270545 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2010 |
record_format | invenio |
spelling | cern-12705452022-08-17T13:24:58Zdoi:10.1088/1742-6596/219/7/072041http://cds.cern.ch/record/1270545engMaier, GSchiffers, MKranzlmueller, DGaidioz, BAssociation rule mining on grid monitoring data to detect error sourcesComputing and ComputersError handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliabilityoai:cds.cern.ch:12705452010 |
spellingShingle | Computing and Computers Maier, G Schiffers, M Kranzlmueller, D Gaidioz, B Association rule mining on grid monitoring data to detect error sources |
title | Association rule mining on grid monitoring data to detect error sources |
title_full | Association rule mining on grid monitoring data to detect error sources |
title_fullStr | Association rule mining on grid monitoring data to detect error sources |
title_full_unstemmed | Association rule mining on grid monitoring data to detect error sources |
title_short | Association rule mining on grid monitoring data to detect error sources |
title_sort | association rule mining on grid monitoring data to detect error sources |
topic | Computing and Computers |
url | https://dx.doi.org/10.1088/1742-6596/219/7/072041 http://cds.cern.ch/record/1270545 |
work_keys_str_mv | AT maierg associationruleminingongridmonitoringdatatodetecterrorsources AT schiffersm associationruleminingongridmonitoringdatatodetecterrorsources AT kranzlmuellerd associationruleminingongridmonitoringdatatodetecterrorsources AT gaidiozb associationruleminingongridmonitoringdatatodetecterrorsources |