Cargando…

Efficient logging and querying for blockchain-based cross-site genomic dataset access audit

BACKGROUND: Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access l...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Shuaicheng, Cao, Yang, Xiong, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372873/
https://www.ncbi.nlm.nih.gov/pubmed/32693835
http://dx.doi.org/10.1186/s12920-020-0725-y
_version_ 1783561400459722752
author Ma, Shuaicheng
Cao, Yang
Xiong, Li
author_facet Ma, Shuaicheng
Cao, Yang
Xiong, Li
author_sort Ma, Shuaicheng
collection PubMed
description BACKGROUND: Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. METHODS: We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. RESULTS: We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. CONCLUSION: We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites.
format Online
Article
Text
id pubmed-7372873
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73728732020-07-21 Efficient logging and querying for blockchain-based cross-site genomic dataset access audit Ma, Shuaicheng Cao, Yang Xiong, Li BMC Med Genomics Research BACKGROUND: Genomic data have been collected by different institutions and companies and need to be shared for broader use. In a cross-site genomic data sharing system, a secure and transparent access control audit module plays an essential role in ensuring the accountability. A centralized access log audit system is vulnerable to the single point of attack and also lack transparency since the log could be tampered by a malicious system administrator or internal adversaries. Several studies have proposed blockchain-based access audit to solve this problem but without considering the efficiency of the audit queries. The 2018 iDASH competition first track provides us with an opportunity to design efficient logging and querying system for cross-site genomic dataset access audit. We designed a blockchain-based log system which can provide a light-weight and widely compatible module for existing blockchain platforms. The submitted solution won the third place of the competition. In this paper, we report the technical details in our system. METHODS: We present two methods: baseline method and enhanced method. We started with the baseline method and then adjusted our implementation based on the competition evaluation criteria and characteristics of the log system. To overcome obstacles of indexing on the immutable Blockchain system, we designed a hierarchical timestamp structure which supports efficient range queries on the timestamp field. RESULTS: We implemented our methods in Python3, tested the scalability, and compared the performance using the test data supplied by competition organizer. We successfully boosted the log retrieval speed for complex AND queries that contain multiple predicates. For the range query, we boosted the speed for at least one order of magnitude. The storage usage is reduced by 25%. CONCLUSION: We demonstrate that Blockchain can be used to build a time and space efficient log and query genomic dataset audit trail. Therefore, it provides a promising solution for sharing genomic data with accountability requirement across multiple sites. BioMed Central 2020-07-21 /pmc/articles/PMC7372873/ /pubmed/32693835 http://dx.doi.org/10.1186/s12920-020-0725-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Shuaicheng
Cao, Yang
Xiong, Li
Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_full Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_fullStr Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_full_unstemmed Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_short Efficient logging and querying for blockchain-based cross-site genomic dataset access audit
title_sort efficient logging and querying for blockchain-based cross-site genomic dataset access audit
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372873/
https://www.ncbi.nlm.nih.gov/pubmed/32693835
http://dx.doi.org/10.1186/s12920-020-0725-y
work_keys_str_mv AT mashuaicheng efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit
AT caoyang efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit
AT xiongli efficientloggingandqueryingforblockchainbasedcrosssitegenomicdatasetaccessaudit