Cargando…

A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage

INTRODUCTION: Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques. The multiple dynamic match-key e...

Descripción completa

Detalles Bibliográficos
Autores principales: Vidanage, A, Ranbaduge, T, Christen, P, Randall, S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Swansea University 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893850/
https://www.ncbi.nlm.nih.gov/pubmed/33644410
http://dx.doi.org/10.23889/ijpds.v5i1.1345
_version_ 1783653130230038528
author Vidanage, A
Ranbaduge, T
Christen, P
Randall, S
author_facet Vidanage, A
Ranbaduge, T
Christen, P
Randall, S
author_sort Vidanage, A
collection PubMed
description INTRODUCTION: Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques. The multiple dynamic match-key encoding approach recently proposed by Randall et al. (IJPDS, 2019) is such a technique aimed at providing enough privacy for linkage applications while obtaining high linkage quality. However, the use of this encoding in large databases can reveal frequency information that can allow the re-identification of encoded values. OBJECTIVES: We propose a frequency-based attack to evaluate the privacy guarantees of multiple dynamic match-key encoding. We then present two recommendations that can be used in this match-key encoding approach to prevent such a privacy attack. METHODS: The proposed attack analyses the frequency distributions of individual match-keys in order to identify the attributes used for each match-key, where we assume the adversary has access to a plain-text database with similar characteristics as the encoded database. We employ a set of statistical correlation tests to compare the frequency distributions of match-key values between the encoded and plain-text databases. Once the attribute combinations used for match-keys are discovered, we then re-identify encoded sensitive values by utilising a frequency alignment method. Next, we propose two recommendations; one to alter the original frequency distributions and another to make the frequency distributions uniform. Both will help to prevent frequency-based attacks. RESULTS: We evaluate our privacy attack using two large real-world databases. The results show that in certain situations the attack can successfully re-identify a set of sensitive values encoded using the multiple dynamic match-key encoding approach. On the databases used in our experiments, the attack can re-identify plain-text values with a precision and recall of both up to 98%. Furthermore, we show that our proposed recommendations are able to make this attack harder to perform with only a small reduction in linkage quality. CONCLUSIONS: Our proposed privacy attack demonstrates the weaknesses of multiple match-key encoding that should be taken into consideration when linking databases that contain sensitive personal information. Our proposed recommendations ensure that the multiple dynamic match-key encoding approach can be used securely while retaining high linkage quality.
format Online
Article
Text
id pubmed-7893850
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Swansea University
record_format MEDLINE/PubMed
spelling pubmed-78938502021-02-26 A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage Vidanage, A Ranbaduge, T Christen, P Randall, S Int J Popul Data Sci Population Data Science INTRODUCTION: Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques. The multiple dynamic match-key encoding approach recently proposed by Randall et al. (IJPDS, 2019) is such a technique aimed at providing enough privacy for linkage applications while obtaining high linkage quality. However, the use of this encoding in large databases can reveal frequency information that can allow the re-identification of encoded values. OBJECTIVES: We propose a frequency-based attack to evaluate the privacy guarantees of multiple dynamic match-key encoding. We then present two recommendations that can be used in this match-key encoding approach to prevent such a privacy attack. METHODS: The proposed attack analyses the frequency distributions of individual match-keys in order to identify the attributes used for each match-key, where we assume the adversary has access to a plain-text database with similar characteristics as the encoded database. We employ a set of statistical correlation tests to compare the frequency distributions of match-key values between the encoded and plain-text databases. Once the attribute combinations used for match-keys are discovered, we then re-identify encoded sensitive values by utilising a frequency alignment method. Next, we propose two recommendations; one to alter the original frequency distributions and another to make the frequency distributions uniform. Both will help to prevent frequency-based attacks. RESULTS: We evaluate our privacy attack using two large real-world databases. The results show that in certain situations the attack can successfully re-identify a set of sensitive values encoded using the multiple dynamic match-key encoding approach. On the databases used in our experiments, the attack can re-identify plain-text values with a precision and recall of both up to 98%. Furthermore, we show that our proposed recommendations are able to make this attack harder to perform with only a small reduction in linkage quality. CONCLUSIONS: Our proposed privacy attack demonstrates the weaknesses of multiple match-key encoding that should be taken into consideration when linking databases that contain sensitive personal information. Our proposed recommendations ensure that the multiple dynamic match-key encoding approach can be used securely while retaining high linkage quality. Swansea University 2020-08-11 /pmc/articles/PMC7893850/ /pubmed/33644410 http://dx.doi.org/10.23889/ijpds.v5i1.1345 Text en https://creativecommons.org/licences/by/4.0/ This work is licenced under a Creative Commons Attribution 4.0 International License.
spellingShingle Population Data Science
Vidanage, A
Ranbaduge, T
Christen, P
Randall, S
A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title_full A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title_fullStr A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title_full_unstemmed A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title_short A Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
title_sort privacy attack on multiple dynamic match-key based privacy-preserving record linkage
topic Population Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893850/
https://www.ncbi.nlm.nih.gov/pubmed/33644410
http://dx.doi.org/10.23889/ijpds.v5i1.1345
work_keys_str_mv AT vidanagea aprivacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT ranbaduget aprivacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT christenp aprivacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT randalls aprivacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT vidanagea privacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT ranbaduget privacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT christenp privacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage
AT randalls privacyattackonmultipledynamicmatchkeybasedprivacypreservingrecordlinkage