Cargando…

A Systematic Review of Re-Identification Attacks on Health Data

BACKGROUND: Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide suff...

Descripción completa

Detalles Bibliográficos
Autores principales: El Emam, Khaled, Jonker, Elizabeth, Arbuckle, Luk, Malin, Bradley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229505/
https://www.ncbi.nlm.nih.gov/pubmed/22164229
http://dx.doi.org/10.1371/journal.pone.0028071
_version_ 1782217951585239040
author El Emam, Khaled
Jonker, Elizabeth
Arbuckle, Luk
Malin, Bradley
author_facet El Emam, Khaled
Jonker, Elizabeth
Arbuckle, Luk
Malin, Bradley
author_sort El Emam, Khaled
collection PubMed
description BACKGROUND: Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods. METHODS AND FINDINGS: Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046–0.478) and 0.34 for attacks on health data (95% CI 0–0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013. CONCLUSIONS: The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.
format Online
Article
Text
id pubmed-3229505
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32295052011-12-07 A Systematic Review of Re-Identification Attacks on Health Data El Emam, Khaled Jonker, Elizabeth Arbuckle, Luk Malin, Bradley PLoS One Research Article BACKGROUND: Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods. METHODS AND FINDINGS: Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046–0.478) and 0.34 for attacks on health data (95% CI 0–0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013. CONCLUSIONS: The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods. Public Library of Science 2011-12-02 /pmc/articles/PMC3229505/ /pubmed/22164229 http://dx.doi.org/10.1371/journal.pone.0028071 Text en El Emam et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
El Emam, Khaled
Jonker, Elizabeth
Arbuckle, Luk
Malin, Bradley
A Systematic Review of Re-Identification Attacks on Health Data
title A Systematic Review of Re-Identification Attacks on Health Data
title_full A Systematic Review of Re-Identification Attacks on Health Data
title_fullStr A Systematic Review of Re-Identification Attacks on Health Data
title_full_unstemmed A Systematic Review of Re-Identification Attacks on Health Data
title_short A Systematic Review of Re-Identification Attacks on Health Data
title_sort systematic review of re-identification attacks on health data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229505/
https://www.ncbi.nlm.nih.gov/pubmed/22164229
http://dx.doi.org/10.1371/journal.pone.0028071
work_keys_str_mv AT elemamkhaled asystematicreviewofreidentificationattacksonhealthdata
AT jonkerelizabeth asystematicreviewofreidentificationattacksonhealthdata
AT arbuckleluk asystematicreviewofreidentificationattacksonhealthdata
AT malinbradley asystematicreviewofreidentificationattacksonhealthdata
AT elemamkhaled systematicreviewofreidentificationattacksonhealthdata
AT jonkerelizabeth systematicreviewofreidentificationattacksonhealthdata
AT arbuckleluk systematicreviewofreidentificationattacksonhealthdata
AT malinbradley systematicreviewofreidentificationattacksonhealthdata