Cargando…

Some methods for blindfolded record linkage

BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one...

Descripción completa

Detalles Bibliográficos
Autores principales: Churches, Tim, Christen, Peter
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC471556/
https://www.ncbi.nlm.nih.gov/pubmed/15222890
http://dx.doi.org/10.1186/1472-6947-4-9
_version_ 1782121629155852288
author Churches, Tim
Christen, Peter
author_facet Churches, Tim
Christen, Peter
author_sort Churches, Tim
collection PubMed
description BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy.
format Text
id pubmed-471556
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4715562004-07-17 Some methods for blindfolded record linkage Churches, Tim Christen, Peter BMC Med Inform Decis Mak Technical Advance BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy. BioMed Central 2004-06-28 /pmc/articles/PMC471556/ /pubmed/15222890 http://dx.doi.org/10.1186/1472-6947-4-9 Text en Copyright © 2004 Churches and Christen; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Technical Advance
Churches, Tim
Christen, Peter
Some methods for blindfolded record linkage
title Some methods for blindfolded record linkage
title_full Some methods for blindfolded record linkage
title_fullStr Some methods for blindfolded record linkage
title_full_unstemmed Some methods for blindfolded record linkage
title_short Some methods for blindfolded record linkage
title_sort some methods for blindfolded record linkage
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC471556/
https://www.ncbi.nlm.nih.gov/pubmed/15222890
http://dx.doi.org/10.1186/1472-6947-4-9
work_keys_str_mv AT churchestim somemethodsforblindfoldedrecordlinkage
AT christenpeter somemethodsforblindfoldedrecordlinkage