Cargando…
Managing very large distributed data sets on a data grid
In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management m...
Autores principales: | , , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1002/cpe.1489 http://cds.cern.ch/record/1359329 |
_version_ | 1780922630253051904 |
---|---|
author | Branco, M de Roure, David Lassnig, Mario Zaluska, Ed Garonne, Vincent |
author_facet | Branco, M de Roure, David Lassnig, Mario Zaluska, Ed Garonne, Vincent |
author_sort | Branco, M |
collection | CERN |
description | In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright (C) 2009 John Wiley \& Sons, Ltd. |
id | cern-1359329 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2010 |
record_format | invenio |
spelling | cern-13593292019-09-30T06:29:59Zdoi:10.1002/cpe.1489http://cds.cern.ch/record/1359329engBranco, Mde Roure, DavidLassnig, MarioZaluska, EdGaronne, VincentManaging very large distributed data sets on a data gridComputing and ComputersIn this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved in the development of the data management middleware. This middleware, called DQ2, has been used for the last several years by the ATLAS experiment for shipping petabytes of data to research centres and universities worldwide. We describe our experience in developing and deploying DQ2 on the Worldwide LHC computing Grid, a production Grid infrastructure formed of hundreds of computing sites. From this operational experience, we have identified an important degree of uncertainty that underlies the behaviour of large Grid infrastructures. This uncertainty is subjected to a detailed analysis, leading us to present novel modelling and simulation techniques for Data Grids. In addition, we discuss what we perceive as practical limits to the development of data distribution algorithms for Data Grids given the underlying infrastructure uncertainty, and propose future research directions. Copyright (C) 2009 John Wiley \& Sons, Ltd.oai:cds.cern.ch:13593292010 |
spellingShingle | Computing and Computers Branco, M de Roure, David Lassnig, Mario Zaluska, Ed Garonne, Vincent Managing very large distributed data sets on a data grid |
title | Managing very large distributed data sets on a data grid |
title_full | Managing very large distributed data sets on a data grid |
title_fullStr | Managing very large distributed data sets on a data grid |
title_full_unstemmed | Managing very large distributed data sets on a data grid |
title_short | Managing very large distributed data sets on a data grid |
title_sort | managing very large distributed data sets on a data grid |
topic | Computing and Computers |
url | https://dx.doi.org/10.1002/cpe.1489 http://cds.cern.ch/record/1359329 |
work_keys_str_mv | AT brancom managingverylargedistributeddatasetsonadatagrid AT derouredavid managingverylargedistributeddatasetsonadatagrid AT lassnigmario managingverylargedistributeddatasetsonadatagrid AT zaluskaed managingverylargedistributeddatasetsonadatagrid AT garonnevincent managingverylargedistributeddatasetsonadatagrid |