Cargando…
Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model
BACKGROUND: The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7542414/ https://www.ncbi.nlm.nih.gov/pubmed/32965236 http://dx.doi.org/10.2196/18920 |
_version_ | 1783591544896356352 |
---|---|
author | Brown, Adrian Paul Randall, Sean M |
author_facet | Brown, Adrian Paul Randall, Sean M |
author_sort | Brown, Adrian Paul |
collection | PubMed |
description | BACKGROUND: The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. OBJECTIVE: This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. METHODS: A new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. RESULTS: The cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). CONCLUSIONS: The result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs. |
format | Online Article Text |
id | pubmed-7542414 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-75424142020-10-20 Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model Brown, Adrian Paul Randall, Sean M JMIR Med Inform Original Paper BACKGROUND: The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. OBJECTIVE: This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. METHODS: A new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. RESULTS: The cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). CONCLUSIONS: The result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs. JMIR Publications 2020-09-23 /pmc/articles/PMC7542414/ /pubmed/32965236 http://dx.doi.org/10.2196/18920 Text en ©Adrian Paul Brown, Sean M Randall. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 23.09.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Brown, Adrian Paul Randall, Sean M Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title | Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title_full | Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title_fullStr | Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title_full_unstemmed | Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title_short | Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model |
title_sort | secure record linkage of large health data sets: evaluation of a hybrid cloud model |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7542414/ https://www.ncbi.nlm.nih.gov/pubmed/32965236 http://dx.doi.org/10.2196/18920 |
work_keys_str_mv | AT brownadrianpaul securerecordlinkageoflargehealthdatasetsevaluationofahybridcloudmodel AT randallseanm securerecordlinkageoflargehealthdatasetsevaluationofahybridcloudmodel |