Cargando…
Privacy preserving data visualizations
Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like m...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7790778/ https://www.ncbi.nlm.nih.gov/pubmed/33442528 http://dx.doi.org/10.1140/epjds/s13688-020-00257-4 |
_version_ | 1783633493285142528 |
---|---|
author | Avraam, Demetris Wilson, Rebecca Butters, Oliver Burton, Thomas Nicolaides, Christos Jones, Elinor Boyd, Andy Burton, Paul |
author_facet | Avraam, Demetris Wilson, Rebecca Butters, Oliver Burton, Thomas Nicolaides, Christos Jones, Elinor Boyd, Andy Burton, Paul |
author_sort | Avraam, Demetris |
collection | PubMed |
description | Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations. |
format | Online Article Text |
id | pubmed-7790778 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-77907782021-01-11 Privacy preserving data visualizations Avraam, Demetris Wilson, Rebecca Butters, Oliver Burton, Thomas Nicolaides, Christos Jones, Elinor Boyd, Andy Burton, Paul EPJ Data Sci Regular Article Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations – such as graphs and plots – may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations. Springer Berlin Heidelberg 2021-01-07 2021 /pmc/articles/PMC7790778/ /pubmed/33442528 http://dx.doi.org/10.1140/epjds/s13688-020-00257-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Regular Article Avraam, Demetris Wilson, Rebecca Butters, Oliver Burton, Thomas Nicolaides, Christos Jones, Elinor Boyd, Andy Burton, Paul Privacy preserving data visualizations |
title | Privacy preserving data visualizations |
title_full | Privacy preserving data visualizations |
title_fullStr | Privacy preserving data visualizations |
title_full_unstemmed | Privacy preserving data visualizations |
title_short | Privacy preserving data visualizations |
title_sort | privacy preserving data visualizations |
topic | Regular Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7790778/ https://www.ncbi.nlm.nih.gov/pubmed/33442528 http://dx.doi.org/10.1140/epjds/s13688-020-00257-4 |
work_keys_str_mv | AT avraamdemetris privacypreservingdatavisualizations AT wilsonrebecca privacypreservingdatavisualizations AT buttersoliver privacypreservingdatavisualizations AT burtonthomas privacypreservingdatavisualizations AT nicolaideschristos privacypreservingdatavisualizations AT joneselinor privacypreservingdatavisualizations AT boydandy privacypreservingdatavisualizations AT burtonpaul privacypreservingdatavisualizations |