Cargando…

The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service

CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that...

Descripción completa

Detalles Bibliográficos
Autor principal: van Kemenade, Jorik
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:http://cds.cern.ch/record/2728246
_version_ 1780966368568410112
author van Kemenade, Jorik
author_facet van Kemenade, Jorik
author_sort van Kemenade, Jorik
collection CERN
description CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows.
id cern-2728246
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling cern-27282462021-06-15T08:16:48Zhttp://cds.cern.ch/record/2728246engvan Kemenade, JorikThe CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival ServiceInformation Transfer and ManagementComputing and ComputersDigital MemoryCERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows.CERN-THESIS-2020-092oai:cds.cern.ch:27282462020-08-17T12:37:30Z
spellingShingle Information Transfer and Management
Computing and Computers
Digital Memory
van Kemenade, Jorik
The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title_full The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title_fullStr The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title_full_unstemmed The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title_short The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
title_sort cern digital memory platform: building a cern scale oais compliant archival service
topic Information Transfer and Management
Computing and Computers
Digital Memory
url http://cds.cern.ch/record/2728246
work_keys_str_mv AT vankemenadejorik thecerndigitalmemoryplatformbuildingacernscaleoaiscompliantarchivalservice
AT vankemenadejorik cerndigitalmemoryplatformbuildingacernscaleoaiscompliantarchivalservice