Cargando…

Scalable Metadata Management Using Onedata and OpenFaaS

<!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as...

Descripción completa

Detalles Bibliográficos
Autor principal: Dutka, Lukasz
Lenguaje:eng
Publicado: 2021
Materias:
Acceso en línea:http://cds.cern.ch/record/2750561
_version_ 1780969137303977984
author Dutka, Lukasz
author_facet Dutka, Lukasz
author_sort Dutka, Lukasz
collection CERN
description <!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as well as other POSIX-compliant file systems. Onedata allows users to collaborate, share, and perform computations on data using applications relying on POSIX compliant data access. Thanks to a fully distributed architecture, Onedata allows for the creation of complex hybrid-cloud infrastructure deployments, including private and commercial cloud resources. Onedata comprises the following services: Onezone - authorisation and distributed metadata management component that provides access to Onedata ecosystem; and Oneprovider - provides actual data to the users and exposes storage systems to Onedata and Oneclient - which allows transparent POSIX-compatible data access on user nodes. Oneprovider instances can be deployed, as a single node or an HPC cluster, on top of high-performance parallel storage solutions with the ability to serve petabytes of data with GB/s throughput. Onedata introduces the concept of Space, a virtual volume, owned by one or more users, where they can organize their data under a global namespace. The Spaces are accessible to users via a web interface, which allows for Dropbox-like file management, a Fuse-based client that can be mounted as a virtual POSIX file system, a Python library (OnedataFS [2]), or REST and CDMI standardized APIs. As a distributed system Onedata can take advantage of modern scalable solutions like Kubernetes and thanks to a rich set of REST APIs and OnedataFS library it can process at scale data and metadata alike using FaaS systems like OpenFass. Currently Onedata is used in European Open Science Cloud Hub [2], PRACE-5IP [3], EOSC Synergy [4], and Archiver [5] project, where it provides data transparency layer for computation deployed on hybrid clouds. Acknowledgements: This work was supported in part by 2018-2020's research funds in the scope of the co-financed international projects framework (project no. 3905/H2020/2018/2, and project no. 3933/H2020/2018/2). [1] Onedata project website. http://onedata.org. [2] OnedataFS - PyFilesystem Interface to Onedata Virtual File System. https://github.com/onedata/fs-onedatafs. [3] European Open Science Cloud Hub (Bringing together multiple service providers to create a single contact point for European researchers and innovators.). https://www.eosc-hub.eu. [4] Partnership for Advanced Computing in Europe - Fifth Implementation Phase. http://www.prace-ri.eu. [5] European Open Science Cloud - Expanding Capacities by building Capabilities. https://www.eosc-synergy.eu. [6] Archiver - Archiving and Preservation for Research Environments). https://www.archiver-project.eu.
id cern-2750561
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27505612022-11-02T22:25:54Zhttp://cds.cern.ch/record/2750561engDutka, LukaszScalable Metadata Management Using Onedata and OpenFaaSCS3 2021- Cloud Storage Synchronization and SharingHEP Computing<!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as well as other POSIX-compliant file systems. Onedata allows users to collaborate, share, and perform computations on data using applications relying on POSIX compliant data access. Thanks to a fully distributed architecture, Onedata allows for the creation of complex hybrid-cloud infrastructure deployments, including private and commercial cloud resources. Onedata comprises the following services: Onezone - authorisation and distributed metadata management component that provides access to Onedata ecosystem; and Oneprovider - provides actual data to the users and exposes storage systems to Onedata and Oneclient - which allows transparent POSIX-compatible data access on user nodes. Oneprovider instances can be deployed, as a single node or an HPC cluster, on top of high-performance parallel storage solutions with the ability to serve petabytes of data with GB/s throughput. Onedata introduces the concept of Space, a virtual volume, owned by one or more users, where they can organize their data under a global namespace. The Spaces are accessible to users via a web interface, which allows for Dropbox-like file management, a Fuse-based client that can be mounted as a virtual POSIX file system, a Python library (OnedataFS [2]), or REST and CDMI standardized APIs. As a distributed system Onedata can take advantage of modern scalable solutions like Kubernetes and thanks to a rich set of REST APIs and OnedataFS library it can process at scale data and metadata alike using FaaS systems like OpenFass. Currently Onedata is used in European Open Science Cloud Hub [2], PRACE-5IP [3], EOSC Synergy [4], and Archiver [5] project, where it provides data transparency layer for computation deployed on hybrid clouds. Acknowledgements: This work was supported in part by 2018-2020's research funds in the scope of the co-financed international projects framework (project no. 3905/H2020/2018/2, and project no. 3933/H2020/2018/2). [1] Onedata project website. http://onedata.org. [2] OnedataFS - PyFilesystem Interface to Onedata Virtual File System. https://github.com/onedata/fs-onedatafs. [3] European Open Science Cloud Hub (Bringing together multiple service providers to create a single contact point for European researchers and innovators.). https://www.eosc-hub.eu. [4] Partnership for Advanced Computing in Europe - Fifth Implementation Phase. http://www.prace-ri.eu. [5] European Open Science Cloud - Expanding Capacities by building Capabilities. https://www.eosc-synergy.eu. [6] Archiver - Archiving and Preservation for Research Environments). https://www.archiver-project.eu.oai:cds.cern.ch:27505612021
spellingShingle HEP Computing
Dutka, Lukasz
Scalable Metadata Management Using Onedata and OpenFaaS
title Scalable Metadata Management Using Onedata and OpenFaaS
title_full Scalable Metadata Management Using Onedata and OpenFaaS
title_fullStr Scalable Metadata Management Using Onedata and OpenFaaS
title_full_unstemmed Scalable Metadata Management Using Onedata and OpenFaaS
title_short Scalable Metadata Management Using Onedata and OpenFaaS
title_sort scalable metadata management using onedata and openfaas
topic HEP Computing
url http://cds.cern.ch/record/2750561
work_keys_str_mv AT dutkalukasz scalablemetadatamanagementusingonedataandopenfaas
AT dutkalukasz cs32021cloudstoragesynchronizationandsharing