Cargando…
The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service
CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2728246 |
_version_ | 1780966368568410112 |
---|---|
author | van Kemenade, Jorik |
author_facet | van Kemenade, Jorik |
author_sort | van Kemenade, Jorik |
collection | CERN |
description | CERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows. |
id | cern-2728246 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2020 |
record_format | invenio |
spelling | cern-27282462021-06-15T08:16:48Zhttp://cds.cern.ch/record/2728246engvan Kemenade, JorikThe CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival ServiceInformation Transfer and ManagementComputing and ComputersDigital MemoryCERN produces a large variety of research data. This data plays an important role in CERN’s heritage and is often unique. As a public institute, it is CERN’s responsibility to preserve current and future research data. To fulfil this responsibility, CERN wants to build an “Archive as a Service” that enables researchers to conveniently preserver their valuable research. In this thesis we investigate a possible strategy for building a CERN wide archiving service using an existing preservation tool, Archivematica. Building an archival service at CERN scale has at least three challenges. 1) The amount of data: CERN currently stores more than 300PB of data. 2) Preservation of versioned data: research is often a series of small, but important changes. This history needs to be preserved without duplicating very large datasets. 3) The variety of systems and workflows: with more than 17,500 researchers the preservation platform needs to integrate with many different workflows and content delivery systems. The main objective of this research is to evaluate if Archivematica can be used as the main component of a digital archiving service at CERN. We discuss how we created a distributed deployment of Archivematica and increased our video processing capacity from 2.5 terabytes per month to approximately 15 terabytes per month. We present a strategy for preserving versioned research data without creating duplicate artefacts. Finally, we evaluate three methods for integrating Archivematica with digital repositories and other digital workflows.CERN-THESIS-2020-092oai:cds.cern.ch:27282462020-08-17T12:37:30Z |
spellingShingle | Information Transfer and Management Computing and Computers Digital Memory van Kemenade, Jorik The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title | The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title_full | The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title_fullStr | The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title_full_unstemmed | The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title_short | The CERN Digital Memory Platform: Building a CERN scale OAIS compliant Archival Service |
title_sort | cern digital memory platform: building a cern scale oais compliant archival service |
topic | Information Transfer and Management Computing and Computers Digital Memory |
url | http://cds.cern.ch/record/2728246 |
work_keys_str_mv | AT vankemenadejorik thecerndigitalmemoryplatformbuildingacernscaleoaiscompliantarchivalservice AT vankemenadejorik cerndigitalmemoryplatformbuildingacernscaleoaiscompliantarchivalservice |