Cargando…
Some methods for blindfolded record linkage
BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC471556/ https://www.ncbi.nlm.nih.gov/pubmed/15222890 http://dx.doi.org/10.1186/1472-6947-4-9 |
_version_ | 1782121629155852288 |
---|---|
author | Churches, Tim Christen, Peter |
author_facet | Churches, Tim Christen, Peter |
author_sort | Churches, Tim |
collection | PubMed |
description | BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy. |
format | Text |
id | pubmed-471556 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-4715562004-07-17 Some methods for blindfolded record linkage Churches, Tim Christen, Peter BMC Med Inform Decis Mak Technical Advance BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy. BioMed Central 2004-06-28 /pmc/articles/PMC471556/ /pubmed/15222890 http://dx.doi.org/10.1186/1472-6947-4-9 Text en Copyright © 2004 Churches and Christen; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Technical Advance Churches, Tim Christen, Peter Some methods for blindfolded record linkage |
title | Some methods for blindfolded record linkage |
title_full | Some methods for blindfolded record linkage |
title_fullStr | Some methods for blindfolded record linkage |
title_full_unstemmed | Some methods for blindfolded record linkage |
title_short | Some methods for blindfolded record linkage |
title_sort | some methods for blindfolded record linkage |
topic | Technical Advance |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC471556/ https://www.ncbi.nlm.nih.gov/pubmed/15222890 http://dx.doi.org/10.1186/1472-6947-4-9 |
work_keys_str_mv | AT churchestim somemethodsforblindfoldedrecordlinkage AT christenpeter somemethodsforblindfoldedrecordlinkage |