Cargando…

Enabling ad-hoc reuse of private data repositories through schema extraction

BACKGROUND: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gleim, Lars Christoph, Karim, Md Rezaul, Zimmermann, Lukas, Kohlbacher, Oliver, Stenzhorn, Holger, Decker, Stefan, Beyan, Oya
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7341611/ https://www.ncbi.nlm.nih.gov/pubmed/32641124 http://dx.doi.org/10.1186/s13326-020-00223-z

_version_	1783555274105159680
author	Gleim, Lars Christoph Karim, Md Rezaul Zimmermann, Lukas Kohlbacher, Oliver Stenzhorn, Holger Decker, Stefan Beyan, Oya
author_facet	Gleim, Lars Christoph Karim, Md Rezaul Zimmermann, Lukas Kohlbacher, Oliver Stenzhorn, Holger Decker, Stefan Beyan, Oya
author_sort	Gleim, Lars Christoph
collection	PubMed
description	BACKGROUND: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore new approaches, such as the Personal Health Train initiative, are emerging to utilize data right in their original repositories, circumventing the need to transfer data. RESULTS: Circumventing limitations of previous systems, this paper proposes a configurable and automated schema extraction and publishing approach, which enables ad-hoc SPARQL query formulation against RDF triple stores without requiring direct access to the private data. The approach is compatible with existing Semantic Web-based technologies and allows for the subsequent execution of such queries in a safe setting under the data provider’s control. Evaluation with four distinct datasets shows that a configurable amount of concise and task-relevant schema, closely describing the structure of the underlying data, was derived, enabling the schema introspection-assisted authoring of SPARQL queries. CONCLUSIONS: Automatically extracting and publishing data schema can enable the introspection-assisted creation of data selection and integration queries. In conjunction with the presented system architecture, this approach can enable reuse of data from private repositories and in settings where agreeing upon a shared schema and encoding a priori is infeasible. As such, it could provide an important step towards reuse of data from previously inaccessible sources and thus towards the proliferation of data-driven methods in the biomedical domain.
format	Online Article Text
id	pubmed-7341611
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-73416112020-07-14 Enabling ad-hoc reuse of private data repositories through schema extraction Gleim, Lars Christoph Karim, Md Rezaul Zimmermann, Lukas Kohlbacher, Oliver Stenzhorn, Holger Decker, Stefan Beyan, Oya J Biomed Semantics Research BACKGROUND: Sharing sensitive data across organizational boundaries is often significantly limited by legal and ethical restrictions. Regulations such as the EU General Data Protection Rules (GDPR) impose strict requirements concerning the protection of personal and privacy sensitive data. Therefore new approaches, such as the Personal Health Train initiative, are emerging to utilize data right in their original repositories, circumventing the need to transfer data. RESULTS: Circumventing limitations of previous systems, this paper proposes a configurable and automated schema extraction and publishing approach, which enables ad-hoc SPARQL query formulation against RDF triple stores without requiring direct access to the private data. The approach is compatible with existing Semantic Web-based technologies and allows for the subsequent execution of such queries in a safe setting under the data provider’s control. Evaluation with four distinct datasets shows that a configurable amount of concise and task-relevant schema, closely describing the structure of the underlying data, was derived, enabling the schema introspection-assisted authoring of SPARQL queries. CONCLUSIONS: Automatically extracting and publishing data schema can enable the introspection-assisted creation of data selection and integration queries. In conjunction with the presented system architecture, this approach can enable reuse of data from private repositories and in settings where agreeing upon a shared schema and encoding a priori is infeasible. As such, it could provide an important step towards reuse of data from previously inaccessible sources and thus towards the proliferation of data-driven methods in the biomedical domain. BioMed Central 2020-07-08 /pmc/articles/PMC7341611/ /pubmed/32641124 http://dx.doi.org/10.1186/s13326-020-00223-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Gleim, Lars Christoph Karim, Md Rezaul Zimmermann, Lukas Kohlbacher, Oliver Stenzhorn, Holger Decker, Stefan Beyan, Oya Enabling ad-hoc reuse of private data repositories through schema extraction
title	Enabling ad-hoc reuse of private data repositories through schema extraction
title_full	Enabling ad-hoc reuse of private data repositories through schema extraction
title_fullStr	Enabling ad-hoc reuse of private data repositories through schema extraction
title_full_unstemmed	Enabling ad-hoc reuse of private data repositories through schema extraction
title_short	Enabling ad-hoc reuse of private data repositories through schema extraction
title_sort	enabling ad-hoc reuse of private data repositories through schema extraction
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7341611/ https://www.ncbi.nlm.nih.gov/pubmed/32641124 http://dx.doi.org/10.1186/s13326-020-00223-z
work_keys_str_mv	AT gleimlarschristoph enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT karimmdrezaul enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT zimmermannlukas enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT kohlbacheroliver enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT stenzhornholger enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT deckerstefan enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction AT beyanoya enablingadhocreuseofprivatedatarepositoriesthroughschemaextraction

Enabling ad-hoc reuse of private data repositories through schema extraction

Ejemplares similares