Cargando…
Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of rec...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206238/ http://dx.doi.org/10.1007/978-3-030-47436-2_11 |
_version_ | 1783530375023165440 |
---|---|
author | Ranbaduge, Thilina Christen, Peter Schnell, Rainer |
author_facet | Ranbaduge, Thilina Christen, Peter Schnell, Rainer |
author_sort | Ranbaduge, Thilina |
collection | PubMed |
description | In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of records that correspond to the same real-world entities. Due to privacy and confidentiality concerns, however, the owners of sensitive databases are often not allowed or willing to exchange or share their data with other organisations to allow such similarity calculations. In this paper we propose a novel privacy-preserving encoding technique that can be used to securely calculate similarities between sensitive values held in different databases. Our technique uses two-step hashing to encode values into an integer set representation that provides strong privacy guarantees and allows accurate similarity calculations. We provide a theoretical analysis of the accuracy and privacy of our encoding technique, and conduct an empirical study on large real databases containing several millions records. Our results show that our technique provides high security against privacy attacks and achieves better similarity accuracy compared to two state-of-the-art encoding techniques. |
format | Online Article Text |
id | pubmed-7206238 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72062382020-05-08 Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage Ranbaduge, Thilina Christen, Peter Schnell, Rainer Advances in Knowledge Discovery and Data Mining Article In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of records that correspond to the same real-world entities. Due to privacy and confidentiality concerns, however, the owners of sensitive databases are often not allowed or willing to exchange or share their data with other organisations to allow such similarity calculations. In this paper we propose a novel privacy-preserving encoding technique that can be used to securely calculate similarities between sensitive values held in different databases. Our technique uses two-step hashing to encode values into an integer set representation that provides strong privacy guarantees and allows accurate similarity calculations. We provide a theoretical analysis of the accuracy and privacy of our encoding technique, and conduct an empirical study on large real databases containing several millions records. Our results show that our technique provides high security against privacy attacks and achieves better similarity accuracy compared to two state-of-the-art encoding techniques. 2020-04-17 /pmc/articles/PMC7206238/ http://dx.doi.org/10.1007/978-3-030-47436-2_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Ranbaduge, Thilina Christen, Peter Schnell, Rainer Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title | Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title_full | Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title_fullStr | Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title_full_unstemmed | Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title_short | Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage |
title_sort | secure and accurate two-step hash encoding for privacy-preserving record linkage |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206238/ http://dx.doi.org/10.1007/978-3-030-47436-2_11 |
work_keys_str_mv | AT ranbadugethilina secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage AT christenpeter secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage AT schnellrainer secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage |