Cargando…

Privacy preserving linkage using multiple match-keys

INTRODUCTION: Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks. OBJECTIVES: In this paper, we present and evaluat...

Descripción completa

Detalles Bibliográficos
Autores principales: Randall, SM, Brown, AP, Ferrante, AM, Boyd, JH
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Swansea University 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7482515/
https://www.ncbi.nlm.nih.gov/pubmed/32935028
http://dx.doi.org/10.23889/ijpds.v4i1.1094
_version_ 1783580802115698688
author Randall, SM
Brown, AP
Ferrante, AM
Boyd, JH
author_facet Randall, SM
Brown, AP
Ferrante, AM
Boyd, JH
author_sort Randall, SM
collection PubMed
description INTRODUCTION: Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks. OBJECTIVES: In this paper, we present and evaluate a novel protocol that aims to meld both the accuracy of the Bloom filter method with the privacy achievable through the anonymous linkage code methodology. METHODS: The protocol involves creating multiple match-keys for each record, with the composition of each match-key depending on attributes of the underlying datasets being compared. The protocol was evaluated through de-duplication of four administrative datasets and two synthetic datasets; the ‘answers’ outlining which records belonged to the same individual were known for each dataset. The results were compared against results achieved with un-encoded linkage and other privacy preserving techniques on the same datasets. RESULTS: The multiple match-key protocol presented here achieved high quality across all datasets, performing better than record-level Bloom filters and the SLK, but worse than field-level Bloom filters. CONCLUSION: The presented method provides high linkage quality while avoiding the frequency based attacks that have been demonstrated against the Bloom filter approach. The method appears promising for real world use.
format Online
Article
Text
id pubmed-7482515
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Swansea University
record_format MEDLINE/PubMed
spelling pubmed-74825152020-09-14 Privacy preserving linkage using multiple match-keys Randall, SM Brown, AP Ferrante, AM Boyd, JH Int J Popul Data Sci Population Data Science INTRODUCTION: Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks. OBJECTIVES: In this paper, we present and evaluate a novel protocol that aims to meld both the accuracy of the Bloom filter method with the privacy achievable through the anonymous linkage code methodology. METHODS: The protocol involves creating multiple match-keys for each record, with the composition of each match-key depending on attributes of the underlying datasets being compared. The protocol was evaluated through de-duplication of four administrative datasets and two synthetic datasets; the ‘answers’ outlining which records belonged to the same individual were known for each dataset. The results were compared against results achieved with un-encoded linkage and other privacy preserving techniques on the same datasets. RESULTS: The multiple match-key protocol presented here achieved high quality across all datasets, performing better than record-level Bloom filters and the SLK, but worse than field-level Bloom filters. CONCLUSION: The presented method provides high linkage quality while avoiding the frequency based attacks that have been demonstrated against the Bloom filter approach. The method appears promising for real world use. Swansea University 2019-05-23 /pmc/articles/PMC7482515/ /pubmed/32935028 http://dx.doi.org/10.23889/ijpds.v4i1.1094 Text en https://creativecommons.org/licences/by/4.0/ This work is licenced under a Creative Commons Attribution 4.0 International License.
spellingShingle Population Data Science
Randall, SM
Brown, AP
Ferrante, AM
Boyd, JH
Privacy preserving linkage using multiple match-keys
title Privacy preserving linkage using multiple match-keys
title_full Privacy preserving linkage using multiple match-keys
title_fullStr Privacy preserving linkage using multiple match-keys
title_full_unstemmed Privacy preserving linkage using multiple match-keys
title_short Privacy preserving linkage using multiple match-keys
title_sort privacy preserving linkage using multiple match-keys
topic Population Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7482515/
https://www.ncbi.nlm.nih.gov/pubmed/32935028
http://dx.doi.org/10.23889/ijpds.v4i1.1094
work_keys_str_mv AT randallsm privacypreservinglinkageusingmultiplematchkeys
AT brownap privacypreservinglinkageusingmultiplematchkeys
AT ferranteam privacypreservinglinkageusingmultiplematchkeys
AT boydjh privacypreservinglinkageusingmultiplematchkeys