Cargando…

Hybrid simulation models for data-intensive systems

Data-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and efficient way. However, with t...

Descripción completa

Detalles Bibliográficos
Autor principal:	Barisits, Martin
Lenguaje:	eng
Publicado:	2017
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/2262420

_version_	1780954140025815040
author	Barisits, Martin
author_facet	Barisits, Martin
author_sort	Barisits, Martin
collection	CERN
description	Data-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and efficient way. However, with the growth of these systems it becomes a hard problem to estimate the effects of modifications to the system, such as data placement algorithms or hardware upgrades, and to validate these changes for potential side effects. This thesis addresses the modeling of operational data-intensive systems and presents a novel simulation model which estimates the performance of system operations. The running example used throughout this thesis is the data-intensive system Rucio, which is used as the data man- agement system of the ATLAS experiment at CERN’s Large Hadron Collider. Existing system models in literature are not applicable to data-intensive workflows, as they only consider computational workflows or make assumptions which do not hold for operational systems. A hybrid modeling approach is proposed which addresses the limits of these models. It partitions the system into discrete components, creates models for these components, and combines them into one concise system model. However, each component model is only built on observed data metrics, such as system traces. The identification of which system components to model and which ones to omit is based on a quantitative system analysis of the Rucio data- intensive system. The storage, network, data integrity validation, and services components were identified. An existing model from literature was utilized for the network component. For the other components models based on machine learning techniques are created and evaluated against historic workloads from the running example. The component models are unified in an event simulator and evaluated agains historic workloads from the Rucio data-intensive system. The median relative evaluation error of the hybrid system model is demonstrated with 22%.
id	cern-2262420
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2017
record_format	invenio
spelling	cern-22624202019-09-30T06:29:59Zhttp://cds.cern.ch/record/2262420engBarisits, MartinHybrid simulation models for data-intensive systemsComputing and ComputersData-intensive systems are used to access and store massive amounts of data by combining the storage resources of multiple data-centers, usually deployed all over the world, in one system. This enables users to utilize these massive storage capabilities in a simple and efficient way. However, with the growth of these systems it becomes a hard problem to estimate the effects of modifications to the system, such as data placement algorithms or hardware upgrades, and to validate these changes for potential side effects. This thesis addresses the modeling of operational data-intensive systems and presents a novel simulation model which estimates the performance of system operations. The running example used throughout this thesis is the data-intensive system Rucio, which is used as the data man- agement system of the ATLAS experiment at CERN’s Large Hadron Collider. Existing system models in literature are not applicable to data-intensive workflows, as they only consider computational workflows or make assumptions which do not hold for operational systems. A hybrid modeling approach is proposed which addresses the limits of these models. It partitions the system into discrete components, creates models for these components, and combines them into one concise system model. However, each component model is only built on observed data metrics, such as system traces. The identification of which system components to model and which ones to omit is based on a quantitative system analysis of the Rucio data- intensive system. The storage, network, data integrity validation, and services components were identified. An existing model from literature was utilized for the network component. For the other components models based on machine learning techniques are created and evaluated against historic workloads from the running example. The component models are unified in an event simulator and evaluated agains historic workloads from the Rucio data-intensive system. The median relative evaluation error of the hybrid system model is demonstrated with 22%.CERN-THESIS-2017-033oai:cds.cern.ch:22624202017-05-05T12:37:40Z
spellingShingle	Computing and Computers Barisits, Martin Hybrid simulation models for data-intensive systems
title	Hybrid simulation models for data-intensive systems
title_full	Hybrid simulation models for data-intensive systems
title_fullStr	Hybrid simulation models for data-intensive systems
title_full_unstemmed	Hybrid simulation models for data-intensive systems
title_short	Hybrid simulation models for data-intensive systems
title_sort	hybrid simulation models for data-intensive systems
topic	Computing and Computers
url	http://cds.cern.ch/record/2262420
work_keys_str_mv	AT barisitsmartin hybridsimulationmodelsfordataintensivesystems

Hybrid simulation models for data-intensive systems

Ejemplares similares