Cargando…

Re-identification of home addresses from spatial locations anonymized by Gaussian skew

BACKGROUND: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocod...

Descripción completa

Detalles Bibliográficos
Autores principales: Cassa, Christopher A, Wieland, Shannon C, Mandl, Kenneth D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2526988/
https://www.ncbi.nlm.nih.gov/pubmed/18700031
http://dx.doi.org/10.1186/1476-072X-7-45
_version_ 1782158780063023104
author Cassa, Christopher A
Wieland, Shannon C
Mandl, Kenneth D
author_facet Cassa, Christopher A
Wieland, Shannon C
Mandl, Kenneth D
author_sort Cassa, Christopher A
collection PubMed
description BACKGROUND: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. RESULTS: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. CONCLUSION: We demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk.
format Text
id pubmed-2526988
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25269882008-08-29 Re-identification of home addresses from spatial locations anonymized by Gaussian skew Cassa, Christopher A Wieland, Shannon C Mandl, Kenneth D Int J Health Geogr Research BACKGROUND: Knowledge of the geographical locations of individuals is fundamental to the practice of spatial epidemiology. One approach to preserving the privacy of individual-level addresses in a data set is to de-identify the data using a non-deterministic blurring algorithm that shifts the geocoded values. We investigate a vulnerability in this approach which enables an adversary to re-identify individuals using multiple anonymized versions of the original data set. If several such versions are available, each can be used to incrementally refine estimates of the original geocoded location. RESULTS: We produce multiple anonymized data sets using a single set of addresses and then progressively average the anonymized results related to each address, characterizing the steep decline in distance from the re-identified point to the original location, (and the reduction in privacy). With ten anonymized copies of an original data set, we find a substantial decrease in average distance from 0.7 km to 0.2 km between the estimated, re-identified address and the original address. With fifty anonymized copies of an original data set, we find a decrease in average distance from 0.7 km to 0.1 km. CONCLUSION: We demonstrate that multiple versions of the same data, each anonymized by non-deterministic Gaussian skew, can be used to ascertain original geographic locations. We explore solutions to this problem that include infrastructure to support the safe disclosure of anonymized medical data to prevent inference or re-identification of original address data, and the use of a Markov-process based algorithm to mitigate this risk. BioMed Central 2008-08-12 /pmc/articles/PMC2526988/ /pubmed/18700031 http://dx.doi.org/10.1186/1476-072X-7-45 Text en Copyright © 2008 Cassa et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Cassa, Christopher A
Wieland, Shannon C
Mandl, Kenneth D
Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title_full Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title_fullStr Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title_full_unstemmed Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title_short Re-identification of home addresses from spatial locations anonymized by Gaussian skew
title_sort re-identification of home addresses from spatial locations anonymized by gaussian skew
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2526988/
https://www.ncbi.nlm.nih.gov/pubmed/18700031
http://dx.doi.org/10.1186/1476-072X-7-45
work_keys_str_mv AT cassachristophera reidentificationofhomeaddressesfromspatiallocationsanonymizedbygaussianskew
AT wielandshannonc reidentificationofhomeaddressesfromspatiallocationsanonymizedbygaussianskew
AT mandlkennethd reidentificationofhomeaddressesfromspatiallocationsanonymizedbygaussianskew