Cargando…

Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics

BACKGROUND: Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of se...

Descripción completa

Detalles Bibliográficos
Autores principales: Broen, Kelly, Trangucci, Rob, Zelner, Jon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7788553/
https://www.ncbi.nlm.nih.gov/pubmed/33413390
http://dx.doi.org/10.1186/s12942-020-00256-8
_version_ 1783633052939845632
author Broen, Kelly
Trangucci, Rob
Zelner, Jon
author_facet Broen, Kelly
Trangucci, Rob
Zelner, Jon
author_sort Broen, Kelly
collection PubMed
description BACKGROUND: Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy. METHODS: We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards. RESULTS: Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard. CONCLUSIONS: Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.
format Online
Article
Text
id pubmed-7788553
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77885532021-01-07 Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics Broen, Kelly Trangucci, Rob Zelner, Jon Int J Health Geogr Research BACKGROUND: Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy. METHODS: We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards. RESULTS: Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard. CONCLUSIONS: Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance. BioMed Central 2021-01-07 /pmc/articles/PMC7788553/ /pubmed/33413390 http://dx.doi.org/10.1186/s12942-020-00256-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Broen, Kelly
Trangucci, Rob
Zelner, Jon
Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title_full Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title_fullStr Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title_full_unstemmed Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title_short Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
title_sort measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7788553/
https://www.ncbi.nlm.nih.gov/pubmed/33413390
http://dx.doi.org/10.1186/s12942-020-00256-8
work_keys_str_mv AT broenkelly measuringtheimpactofspatialperturbationsontherelationshipbetweendataprivacyandvalidityofdescriptivestatistics
AT tranguccirob measuringtheimpactofspatialperturbationsontherelationshipbetweendataprivacyandvalidityofdescriptivestatistics
AT zelnerjon measuringtheimpactofspatialperturbationsontherelationshipbetweendataprivacyandvalidityofdescriptivestatistics