Cargando…

The Repack Challenge

Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: t...

Descripción completa

Detalles Bibliográficos
Autor principal: Kruse, Daniele Francesco
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/4/042028
http://cds.cern.ch/record/2026334
Descripción
Sumario:Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack. Repack is vital for any large tape user: without it, one would have to buy more tape libraries and more floor space and, eventually, data on old non supported tapes would become unreadable and be lost forever. In this paper we describe the challenge of repacking 115 PB before LHC data taking starts in the beginning of 2015. This process will have to run concurrently with the existing experiment tape activities, and therefore needs to be as transparent as possible for users. Making sure that this works out seamlessly implies careful planning of the resources and the various policies for sharing them fairly and conveniently. To tackle this problem we need to fully exploit the speed and throughput of our modern tape drives. This involves proper dimensioning and configuration of the disk arrays and all the links between them and the tape servers, i.e the machines responsible for managing the tape drives. It is also equally important to provide tools to improve the efficiency with which we use our tape libraries. The new repack setup we deployed has on average increased tape drive throughput by 80%, allowing them to perform closer to their design specifications. This improvement in turn means a 48% decrease in the number of drives needed to achieve the required throughput to complete the full repack on time.