Cargando…

On-demand cloud-based secure environments for analysing personal and health data

<!--HTML-->Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collaborative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrativ...

Descripción completa

Detalles Bibliográficos
Autor principal: Tangaro, Marco Antonio
Lenguaje:eng
Publicado: 2023
Materias:
Acceso en línea:http://cds.cern.ch/record/2855378
_version_ 1780977455237955584
author Tangaro, Marco Antonio
author_facet Tangaro, Marco Antonio
author_sort Tangaro, Marco Antonio
collection CERN
description <!--HTML-->Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collaborative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrative control over a private Galaxy instance. These situations include, but are not limited to, worries about data privacy, the need for customization, the need to prioritise particular job types, the development of tools, and training activities. The Laniakea [1] software platform facilitates the provisioning of on-demand Galaxy instances over heterogeneous Cloud infrastructures, by leveraging on the open source INDIGO-DataCloud cloud stack [2], which aims to make cloud infrastructures more accessible by scientific communities. End users interact with Laniakea through a web front-end that allows a general setup of the Galaxy instance. The deployment of the virtual hardware and of the Galaxy software ecosystem is subsequently performed by the INDIGO Platform as a Service layer. At the end of the process, the user gains access to a private, production-grade, fully customizable, Galaxy virtual instance. Laniakea features the deployment of stand-alone or cluster backed Galaxy instances, shared reference data volumes, and rapid development of novel Galaxy flavours for specific tasks. Moreover, to extend the usage of this platform in clinical scenarios, where the analysis of sensitive data, in compliance with the GDPR, requires strong countermeasures to grant data privacy and security, Laniakea guarantees the creation of isolated and secure environments, exploiting storage encryption and access control to Galaxy through VPN, in order to carry out data analysis. Laniakea allows the on-demand encryption of the entire storage volume attached to the virtual machine, using the Linux kernel encryption module. The level of disk encryption is completely transparent to software applications, in this case Galaxy: data are encrypted and decrypted on-the-fly when writing and reading, respectively. The procedure has been completely automated through the web Dashboard of the PaaS orchestration service [3], taking advantage of Hashicorp Vault for storing user passphrases. We have implemented a robust mechanism to create secure encryption keys and prevent user credentials or the encryption passphrase from being transmitted unencrypted to the virtual infrastructure, compromising its security. The oral contribution will provide details about the platform architecture and the service implementation strategy. **References** [1] Tangaro at al. , Laniakea: an open solution to provide Galaxy “on-demand” instances over heterogeneous cloud infrastructures, GigaScience, Volume 9, Issue 4, April 2020, giaa033, https://doi.org/10.1093/gigascience/giaa033 [2] Salomoni, D., Campos, I., Gaido, L. et al. INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures. J Grid Computing 16, 381–408 (2018). https://doi.org/10.1007/s10723-018-9453-3 [3] https://github.com/indigo-dc/orchestrator
id cern-2855378
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2023
record_format invenio
spelling cern-28553782023-04-03T19:01:41Zhttp://cds.cern.ch/record/2855378engTangaro, Marco AntonioOn-demand cloud-based secure environments for analysing personal and health dataCS3 2023 - Cloud Storage Synchronization and SharingHEP Computing<!--HTML-->Galaxy is the de facto standard workflow manager for bioinformatics providing a complete collaborative platform for researchers. Even though several Galaxy public servers are currently available, there are some situations where users would benefit more from having full administrative control over a private Galaxy instance. These situations include, but are not limited to, worries about data privacy, the need for customization, the need to prioritise particular job types, the development of tools, and training activities. The Laniakea [1] software platform facilitates the provisioning of on-demand Galaxy instances over heterogeneous Cloud infrastructures, by leveraging on the open source INDIGO-DataCloud cloud stack [2], which aims to make cloud infrastructures more accessible by scientific communities. End users interact with Laniakea through a web front-end that allows a general setup of the Galaxy instance. The deployment of the virtual hardware and of the Galaxy software ecosystem is subsequently performed by the INDIGO Platform as a Service layer. At the end of the process, the user gains access to a private, production-grade, fully customizable, Galaxy virtual instance. Laniakea features the deployment of stand-alone or cluster backed Galaxy instances, shared reference data volumes, and rapid development of novel Galaxy flavours for specific tasks. Moreover, to extend the usage of this platform in clinical scenarios, where the analysis of sensitive data, in compliance with the GDPR, requires strong countermeasures to grant data privacy and security, Laniakea guarantees the creation of isolated and secure environments, exploiting storage encryption and access control to Galaxy through VPN, in order to carry out data analysis. Laniakea allows the on-demand encryption of the entire storage volume attached to the virtual machine, using the Linux kernel encryption module. The level of disk encryption is completely transparent to software applications, in this case Galaxy: data are encrypted and decrypted on-the-fly when writing and reading, respectively. The procedure has been completely automated through the web Dashboard of the PaaS orchestration service [3], taking advantage of Hashicorp Vault for storing user passphrases. We have implemented a robust mechanism to create secure encryption keys and prevent user credentials or the encryption passphrase from being transmitted unencrypted to the virtual infrastructure, compromising its security. The oral contribution will provide details about the platform architecture and the service implementation strategy. **References** [1] Tangaro at al. , Laniakea: an open solution to provide Galaxy “on-demand” instances over heterogeneous cloud infrastructures, GigaScience, Volume 9, Issue 4, April 2020, giaa033, https://doi.org/10.1093/gigascience/giaa033 [2] Salomoni, D., Campos, I., Gaido, L. et al. INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures. J Grid Computing 16, 381–408 (2018). https://doi.org/10.1007/s10723-018-9453-3 [3] https://github.com/indigo-dc/orchestratoroai:cds.cern.ch:28553782023
spellingShingle HEP Computing
Tangaro, Marco Antonio
On-demand cloud-based secure environments for analysing personal and health data
title On-demand cloud-based secure environments for analysing personal and health data
title_full On-demand cloud-based secure environments for analysing personal and health data
title_fullStr On-demand cloud-based secure environments for analysing personal and health data
title_full_unstemmed On-demand cloud-based secure environments for analysing personal and health data
title_short On-demand cloud-based secure environments for analysing personal and health data
title_sort on-demand cloud-based secure environments for analysing personal and health data
topic HEP Computing
url http://cds.cern.ch/record/2855378
work_keys_str_mv AT tangaromarcoantonio ondemandcloudbasedsecureenvironmentsforanalysingpersonalandhealthdata
AT tangaromarcoantonio cs32023cloudstoragesynchronizationandsharing