Cargando…

The Repack Challenge

Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: t...

Descripción completa

Detalles Bibliográficos
Autor principal: Kruse, Daniele Francesco
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/4/042028
http://cds.cern.ch/record/2026334
_version_ 1780947349982412800
author Kruse, Daniele Francesco
author_facet Kruse, Daniele Francesco
author_sort Kruse, Daniele Francesco
collection CERN
description Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack. Repack is vital for any large tape user: without it, one would have to buy more tape libraries and more floor space and, eventually, data on old non supported tapes would become unreadable and be lost forever. In this paper we describe the challenge of repacking 115 PB before LHC data taking starts in the beginning of 2015. This process will have to run concurrently with the existing experiment tape activities, and therefore needs to be as transparent as possible for users. Making sure that this works out seamlessly implies careful planning of the resources and the various policies for sharing them fairly and conveniently. To tackle this problem we need to fully exploit the speed and throughput of our modern tape drives. This involves proper dimensioning and configuration of the disk arrays and all the links between them and the tape servers, i.e the machines responsible for managing the tape drives. It is also equally important to provide tools to improve the efficiency with which we use our tape libraries. The new repack setup we deployed has on average increased tape drive throughput by 80%, allowing them to perform closer to their design specifications. This improvement in turn means a 48% decrease in the number of drives needed to achieve the required throughput to complete the full repack on time.
id oai-inspirehep.net-1302103
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling oai-inspirehep.net-13021032022-08-17T13:29:08Zdoi:10.1088/1742-6596/513/4/042028http://cds.cern.ch/record/2026334engKruse, Daniele FrancescoThe Repack ChallengeDetectors and Experimental TechniquesComputing and ComputersPhysics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack. Repack is vital for any large tape user: without it, one would have to buy more tape libraries and more floor space and, eventually, data on old non supported tapes would become unreadable and be lost forever. In this paper we describe the challenge of repacking 115 PB before LHC data taking starts in the beginning of 2015. This process will have to run concurrently with the existing experiment tape activities, and therefore needs to be as transparent as possible for users. Making sure that this works out seamlessly implies careful planning of the resources and the various policies for sharing them fairly and conveniently. To tackle this problem we need to fully exploit the speed and throughput of our modern tape drives. This involves proper dimensioning and configuration of the disk arrays and all the links between them and the tape servers, i.e the machines responsible for managing the tape drives. It is also equally important to provide tools to improve the efficiency with which we use our tape libraries. The new repack setup we deployed has on average increased tape drive throughput by 80%, allowing them to perform closer to their design specifications. This improvement in turn means a 48% decrease in the number of drives needed to achieve the required throughput to complete the full repack on time.oai:inspirehep.net:13021032014
spellingShingle Detectors and Experimental Techniques
Computing and Computers
Kruse, Daniele Francesco
The Repack Challenge
title The Repack Challenge
title_full The Repack Challenge
title_fullStr The Repack Challenge
title_full_unstemmed The Repack Challenge
title_short The Repack Challenge
title_sort repack challenge
topic Detectors and Experimental Techniques
Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/513/4/042028
http://cds.cern.ch/record/2026334
work_keys_str_mv AT krusedanielefrancesco therepackchallenge
AT krusedanielefrancesco repackchallenge