Cargando…

Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects

CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project spaces data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over thousands of disks with a...

Descripción completa

Detalles Bibliográficos
Autores principales: Cameselle, Roberto Valverde, Gonzalez Labrador, Hugo
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/202125102071
http://cds.cern.ch/record/2814362
_version_ 1780973442624913408
author Cameselle, Roberto Valverde
Gonzalez Labrador, Hugo
author_facet Cameselle, Roberto Valverde
Gonzalez Labrador, Hugo
author_sort Cameselle, Roberto Valverde
collection CERN
description CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project spaces data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over thousands of disks with a tworeplica layout. Performing a backup operation over this vast amount of data and number of files is a non-trivial task. The original CERNBox backup system (an in-house event-driven file-level system) has been reconsidered and replaced by a new distributed and scalable backup infrastructure based on the open source tool RESTIC. The new system, codenamed cback, provides features needed in the HEP community to guarantee data safety and smooth operation from the system administrators. Daily snapshot-based backups of all our user and project areas along with automatic verification and restores are possible with this the new development. The backup data is also de-duplicated in blocks and stored as objects in a disk-based S3 cluster in another geographical location on the CERN campus, reducing storage costs and protecting critical data from major catastrophic events. We report on the design and operational experience of running the system and future improvement possibilities.
id cern-2814362
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-28143622022-07-02T18:07:41Zdoi:10.1051/epjconf/202125102071http://cds.cern.ch/record/2814362engCameselle, Roberto ValverdeGonzalez Labrador, HugoAddressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objectsComputing and ComputersCERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project spaces data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over thousands of disks with a tworeplica layout. Performing a backup operation over this vast amount of data and number of files is a non-trivial task. The original CERNBox backup system (an in-house event-driven file-level system) has been reconsidered and replaced by a new distributed and scalable backup infrastructure based on the open source tool RESTIC. The new system, codenamed cback, provides features needed in the HEP community to guarantee data safety and smooth operation from the system administrators. Daily snapshot-based backups of all our user and project areas along with automatic verification and restores are possible with this the new development. The backup data is also de-duplicated in blocks and stored as objects in a disk-based S3 cluster in another geographical location on the CERN campus, reducing storage costs and protecting critical data from major catastrophic events. We report on the design and operational experience of running the system and future improvement possibilities.oai:cds.cern.ch:28143622021
spellingShingle Computing and Computers
Cameselle, Roberto Valverde
Gonzalez Labrador, Hugo
Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title_full Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title_fullStr Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title_full_unstemmed Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title_short Addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
title_sort addressing a billion-entries multi-petabyte distributed file system backup problem with cback: from files to objects
topic Computing and Computers
url https://dx.doi.org/10.1051/epjconf/202125102071
http://cds.cern.ch/record/2814362
work_keys_str_mv AT camesellerobertovalverde addressingabillionentriesmultipetabytedistributedfilesystembackupproblemwithcbackfromfilestoobjects
AT gonzalezlabradorhugo addressingabillionentriesmultipetabytedistributedfilesystembackupproblemwithcbackfromfilestoobjects