Cargando…
Scalable Metadata Management Using Onedata and OpenFaaS
<!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/2750561 |
_version_ | 1780969137303977984 |
---|---|
author | Dutka, Lukasz |
author_facet | Dutka, Lukasz |
author_sort | Dutka, Lukasz |
collection | CERN |
description | <!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as well as other POSIX-compliant file systems.
Onedata allows users to collaborate, share, and perform computations on data using applications relying on POSIX compliant data access. Thanks to a fully distributed architecture, Onedata allows for the creation of complex hybrid-cloud infrastructure deployments, including private and commercial cloud resources.
Onedata comprises the following services: Onezone - authorisation and distributed metadata management component that provides access to Onedata ecosystem; and Oneprovider - provides actual data to the users and exposes storage systems to Onedata and Oneclient - which allows transparent POSIX-compatible data access on user nodes. Oneprovider instances can be deployed, as a single node or an HPC cluster, on top of high-performance parallel storage solutions with the ability to serve petabytes of data with GB/s throughput.
Onedata introduces the concept of Space, a virtual volume, owned by one or more users, where they can organize their data under a global namespace. The Spaces are accessible to users via a web interface, which allows for Dropbox-like file management, a Fuse-based client that can be mounted as a virtual POSIX file system, a Python library (OnedataFS [2]), or REST and CDMI standardized APIs. As a distributed system Onedata can take advantage of modern scalable solutions like Kubernetes and thanks to a rich set of REST APIs and OnedataFS library it can process at scale data and metadata alike using FaaS systems like OpenFass.
Currently Onedata is used in European Open Science Cloud Hub [2], PRACE-5IP [3], EOSC Synergy [4], and Archiver [5] project, where it provides data transparency layer for computation deployed on hybrid clouds.
Acknowledgements: This work was supported in part by 2018-2020's research funds in the scope of the co-financed international projects framework (project no. 3905/H2020/2018/2, and project no. 3933/H2020/2018/2).
[1] Onedata project website. http://onedata.org.
[2] OnedataFS - PyFilesystem Interface to Onedata Virtual File System. https://github.com/onedata/fs-onedatafs.
[3] European Open Science Cloud Hub (Bringing together multiple service providers to create a single contact point for European researchers and innovators.). https://www.eosc-hub.eu.
[4] Partnership for Advanced Computing in Europe - Fifth Implementation Phase. http://www.prace-ri.eu.
[5] European Open Science Cloud - Expanding Capacities by building Capabilities. https://www.eosc-synergy.eu.
[6] Archiver - Archiving and Preservation for Research Environments). https://www.archiver-project.eu. |
id | cern-2750561 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2021 |
record_format | invenio |
spelling | cern-27505612022-11-02T22:25:54Zhttp://cds.cern.ch/record/2750561engDutka, LukaszScalable Metadata Management Using Onedata and OpenFaaSCS3 2021- Cloud Storage Synchronization and SharingHEP Computing<!--HTML-->Onedata [1] is a global high-performance, transparent data management system, that unifies data access across globally distributed infrastructures and multiple types of underlying storages, such as NFS, Amazon S3, Ceph, OpenStack Swift, WebDAV, XRootD and HTTP and HTTPS servers, as well as other POSIX-compliant file systems. Onedata allows users to collaborate, share, and perform computations on data using applications relying on POSIX compliant data access. Thanks to a fully distributed architecture, Onedata allows for the creation of complex hybrid-cloud infrastructure deployments, including private and commercial cloud resources. Onedata comprises the following services: Onezone - authorisation and distributed metadata management component that provides access to Onedata ecosystem; and Oneprovider - provides actual data to the users and exposes storage systems to Onedata and Oneclient - which allows transparent POSIX-compatible data access on user nodes. Oneprovider instances can be deployed, as a single node or an HPC cluster, on top of high-performance parallel storage solutions with the ability to serve petabytes of data with GB/s throughput. Onedata introduces the concept of Space, a virtual volume, owned by one or more users, where they can organize their data under a global namespace. The Spaces are accessible to users via a web interface, which allows for Dropbox-like file management, a Fuse-based client that can be mounted as a virtual POSIX file system, a Python library (OnedataFS [2]), or REST and CDMI standardized APIs. As a distributed system Onedata can take advantage of modern scalable solutions like Kubernetes and thanks to a rich set of REST APIs and OnedataFS library it can process at scale data and metadata alike using FaaS systems like OpenFass. Currently Onedata is used in European Open Science Cloud Hub [2], PRACE-5IP [3], EOSC Synergy [4], and Archiver [5] project, where it provides data transparency layer for computation deployed on hybrid clouds. Acknowledgements: This work was supported in part by 2018-2020's research funds in the scope of the co-financed international projects framework (project no. 3905/H2020/2018/2, and project no. 3933/H2020/2018/2). [1] Onedata project website. http://onedata.org. [2] OnedataFS - PyFilesystem Interface to Onedata Virtual File System. https://github.com/onedata/fs-onedatafs. [3] European Open Science Cloud Hub (Bringing together multiple service providers to create a single contact point for European researchers and innovators.). https://www.eosc-hub.eu. [4] Partnership for Advanced Computing in Europe - Fifth Implementation Phase. http://www.prace-ri.eu. [5] European Open Science Cloud - Expanding Capacities by building Capabilities. https://www.eosc-synergy.eu. [6] Archiver - Archiving and Preservation for Research Environments). https://www.archiver-project.eu.oai:cds.cern.ch:27505612021 |
spellingShingle | HEP Computing Dutka, Lukasz Scalable Metadata Management Using Onedata and OpenFaaS |
title | Scalable Metadata Management Using Onedata and OpenFaaS |
title_full | Scalable Metadata Management Using Onedata and OpenFaaS |
title_fullStr | Scalable Metadata Management Using Onedata and OpenFaaS |
title_full_unstemmed | Scalable Metadata Management Using Onedata and OpenFaaS |
title_short | Scalable Metadata Management Using Onedata and OpenFaaS |
title_sort | scalable metadata management using onedata and openfaas |
topic | HEP Computing |
url | http://cds.cern.ch/record/2750561 |
work_keys_str_mv | AT dutkalukasz scalablemetadatamanagementusingonedataandopenfaas AT dutkalukasz cs32021cloudstoragesynchronizationandsharing |