Cargando…
The Repack Challenge
Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: t...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2014
|
Materias: | |
Acceso en línea: | https://dx.doi.org/10.1088/1742-6596/513/4/042028 http://cds.cern.ch/record/2026334 |
_version_ | 1780947349982412800 |
---|---|
author | Kruse, Daniele Francesco |
author_facet | Kruse, Daniele Francesco |
author_sort | Kruse, Daniele Francesco |
collection | CERN |
description | Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack. Repack is vital for any large tape user: without it, one would have to buy more tape libraries and more floor space and, eventually, data on old non supported tapes would become unreadable and be lost forever. In this paper we describe the challenge of repacking 115 PB before LHC data taking starts in the beginning of 2015. This process will have to run concurrently with the existing experiment tape activities, and therefore needs to be as transparent as possible for users. Making sure that this works out seamlessly implies careful planning of the resources and the various policies for sharing them fairly and conveniently. To tackle this problem we need to fully exploit the speed and throughput of our modern tape drives. This involves proper dimensioning and configuration of the disk arrays and all the links between them and the tape servers, i.e the machines responsible for managing the tape drives. It is also equally important to provide tools to improve the efficiency with which we use our tape libraries. The new repack setup we deployed has on average increased tape drive throughput by 80%, allowing them to perform closer to their design specifications. This improvement in turn means a 48% decrease in the number of drives needed to achieve the required throughput to complete the full repack on time. |
id | oai-inspirehep.net-1302103 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2014 |
record_format | invenio |
spelling | oai-inspirehep.net-13021032022-08-17T13:29:08Zdoi:10.1088/1742-6596/513/4/042028http://cds.cern.ch/record/2026334engKruse, Daniele FrancescoThe Repack ChallengeDetectors and Experimental TechniquesComputing and ComputersPhysics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack. Repack is vital for any large tape user: without it, one would have to buy more tape libraries and more floor space and, eventually, data on old non supported tapes would become unreadable and be lost forever. In this paper we describe the challenge of repacking 115 PB before LHC data taking starts in the beginning of 2015. This process will have to run concurrently with the existing experiment tape activities, and therefore needs to be as transparent as possible for users. Making sure that this works out seamlessly implies careful planning of the resources and the various policies for sharing them fairly and conveniently. To tackle this problem we need to fully exploit the speed and throughput of our modern tape drives. This involves proper dimensioning and configuration of the disk arrays and all the links between them and the tape servers, i.e the machines responsible for managing the tape drives. It is also equally important to provide tools to improve the efficiency with which we use our tape libraries. The new repack setup we deployed has on average increased tape drive throughput by 80%, allowing them to perform closer to their design specifications. This improvement in turn means a 48% decrease in the number of drives needed to achieve the required throughput to complete the full repack on time.oai:inspirehep.net:13021032014 |
spellingShingle | Detectors and Experimental Techniques Computing and Computers Kruse, Daniele Francesco The Repack Challenge |
title | The Repack Challenge |
title_full | The Repack Challenge |
title_fullStr | The Repack Challenge |
title_full_unstemmed | The Repack Challenge |
title_short | The Repack Challenge |
title_sort | repack challenge |
topic | Detectors and Experimental Techniques Computing and Computers |
url | https://dx.doi.org/10.1088/1742-6596/513/4/042028 http://cds.cern.ch/record/2026334 |
work_keys_str_mv | AT krusedanielefrancesco therepackchallenge AT krusedanielefrancesco repackchallenge |