Cargando…

Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage

In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of rec...

Descripción completa

Detalles Bibliográficos
Autores principales: Ranbaduge, Thilina, Christen, Peter, Schnell, Rainer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206238/
http://dx.doi.org/10.1007/978-3-030-47436-2_11
_version_ 1783530375023165440
author Ranbaduge, Thilina
Christen, Peter
Schnell, Rainer
author_facet Ranbaduge, Thilina
Christen, Peter
Schnell, Rainer
author_sort Ranbaduge, Thilina
collection PubMed
description In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of records that correspond to the same real-world entities. Due to privacy and confidentiality concerns, however, the owners of sensitive databases are often not allowed or willing to exchange or share their data with other organisations to allow such similarity calculations. In this paper we propose a novel privacy-preserving encoding technique that can be used to securely calculate similarities between sensitive values held in different databases. Our technique uses two-step hashing to encode values into an integer set representation that provides strong privacy guarantees and allows accurate similarity calculations. We provide a theoretical analysis of the accuracy and privacy of our encoding technique, and conduct an empirical study on large real databases containing several millions records. Our results show that our technique provides high security against privacy attacks and achieves better similarity accuracy compared to two state-of-the-art encoding techniques.
format Online
Article
Text
id pubmed-7206238
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72062382020-05-08 Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage Ranbaduge, Thilina Christen, Peter Schnell, Rainer Advances in Knowledge Discovery and Data Mining Article In order to discover new insights from data, there is a growing need to share information that is distributed across multiple databases that are often held by different organisations. One key task in data integration is the calculation of similarities between records to identify pairs or sets of records that correspond to the same real-world entities. Due to privacy and confidentiality concerns, however, the owners of sensitive databases are often not allowed or willing to exchange or share their data with other organisations to allow such similarity calculations. In this paper we propose a novel privacy-preserving encoding technique that can be used to securely calculate similarities between sensitive values held in different databases. Our technique uses two-step hashing to encode values into an integer set representation that provides strong privacy guarantees and allows accurate similarity calculations. We provide a theoretical analysis of the accuracy and privacy of our encoding technique, and conduct an empirical study on large real databases containing several millions records. Our results show that our technique provides high security against privacy attacks and achieves better similarity accuracy compared to two state-of-the-art encoding techniques. 2020-04-17 /pmc/articles/PMC7206238/ http://dx.doi.org/10.1007/978-3-030-47436-2_11 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Ranbaduge, Thilina
Christen, Peter
Schnell, Rainer
Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title_full Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title_fullStr Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title_full_unstemmed Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title_short Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage
title_sort secure and accurate two-step hash encoding for privacy-preserving record linkage
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206238/
http://dx.doi.org/10.1007/978-3-030-47436-2_11
work_keys_str_mv AT ranbadugethilina secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage
AT christenpeter secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage
AT schnellrainer secureandaccuratetwostephashencodingforprivacypreservingrecordlinkage