Cargando…

Workload modelling for data-intensive systems

This thesis presents a comprehensive study built upon the requirements of a global data-intensive system, built for the ATLAS Experiment at CERN's Large Hadron Collider. First, a scalable method is described to capture distributed data management operations in a non-intrusive way. These operati...

Descripción completa

Detalles Bibliográficos
Autor principal: Lassnig, Mario
Lenguaje:eng
Publicado: 2016
Acceso en línea:http://cds.cern.ch/record/2235088
_version_ 1780952761892864000
author Lassnig, Mario
author_facet Lassnig, Mario
author_sort Lassnig, Mario
collection CERN
description This thesis presents a comprehensive study built upon the requirements of a global data-intensive system, built for the ATLAS Experiment at CERN's Large Hadron Collider. First, a scalable method is described to capture distributed data management operations in a non-intrusive way. These operations are collected into a globally synchronised sequence of events, the workload. A comparative analysis of this new data-intensive workload against existing computational workloads is conducted, leading to the discovery of the importance of descriptive attributes in the operations. Existing computational workload models only consider the arrival rates of operations, however, in data-intensive systems the correlations between attributes play a central role. Furthermore, the detrimental effect of rapid correlated arrivals, so called bursts, is assessed. A model is proposed that can learn burst behaviour from captured workload, and in turn forecast potential future bursts. To help with the creation of a full representative workload model, a similarity measure is proposed that assesses the internal structure of the workload in a two-step method: the time-dependent attribute is decomposed via wavelet transformation, and descriptive attributes are learnt via association rule mining. Finally, an analytical workload model is proposed, that supports the inherent features of data-intensive systems without a learning step. That way, potential future systems in development can use workload that is representative of data-intensive systems even though no particular historical data is available.
id cern-2235088
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2016
record_format invenio
spelling cern-22350882019-09-30T06:29:59Zhttp://cds.cern.ch/record/2235088engLassnig, MarioWorkload modelling for data-intensive systemsThis thesis presents a comprehensive study built upon the requirements of a global data-intensive system, built for the ATLAS Experiment at CERN's Large Hadron Collider. First, a scalable method is described to capture distributed data management operations in a non-intrusive way. These operations are collected into a globally synchronised sequence of events, the workload. A comparative analysis of this new data-intensive workload against existing computational workloads is conducted, leading to the discovery of the importance of descriptive attributes in the operations. Existing computational workload models only consider the arrival rates of operations, however, in data-intensive systems the correlations between attributes play a central role. Furthermore, the detrimental effect of rapid correlated arrivals, so called bursts, is assessed. A model is proposed that can learn burst behaviour from captured workload, and in turn forecast potential future bursts. To help with the creation of a full representative workload model, a similarity measure is proposed that assesses the internal structure of the workload in a two-step method: the time-dependent attribute is decomposed via wavelet transformation, and descriptive attributes are learnt via association rule mining. Finally, an analytical workload model is proposed, that supports the inherent features of data-intensive systems without a learning step. That way, potential future systems in development can use workload that is representative of data-intensive systems even though no particular historical data is available.CERN-THESIS-2014-379oai:cds.cern.ch:22350882016-11-23T10:40:32Z
spellingShingle Lassnig, Mario
Workload modelling for data-intensive systems
title Workload modelling for data-intensive systems
title_full Workload modelling for data-intensive systems
title_fullStr Workload modelling for data-intensive systems
title_full_unstemmed Workload modelling for data-intensive systems
title_short Workload modelling for data-intensive systems
title_sort workload modelling for data-intensive systems
url http://cds.cern.ch/record/2235088
work_keys_str_mv AT lassnigmario workloadmodellingfordataintensivesystems