Cargando…

A scalable software solution for anonymizing high-dimensional biomedical data

BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing c...

Descripción completa

Detalles Bibliográficos
Autores principales: Meurers, Thierry, Bild, Raffael, Do, Kieu-Mi, Prasser, Fabian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489190/
https://www.ncbi.nlm.nih.gov/pubmed/34605868
http://dx.doi.org/10.1093/gigascience/giab068
_version_ 1784578303771803648
author Meurers, Thierry
Bild, Raffael
Do, Kieu-Mi
Prasser, Fabian
author_facet Meurers, Thierry
Bild, Raffael
Do, Kieu-Mi
Prasser, Fabian
author_sort Meurers, Thierry
collection PubMed
description BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.
format Online
Article
Text
id pubmed-8489190
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84891902021-10-05 A scalable software solution for anonymizing high-dimensional biomedical data Meurers, Thierry Bild, Raffael Do, Kieu-Mi Prasser, Fabian Gigascience Technical Note BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing. Oxford University Press 2021-10-04 /pmc/articles/PMC8489190/ /pubmed/34605868 http://dx.doi.org/10.1093/gigascience/giab068 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Meurers, Thierry
Bild, Raffael
Do, Kieu-Mi
Prasser, Fabian
A scalable software solution for anonymizing high-dimensional biomedical data
title A scalable software solution for anonymizing high-dimensional biomedical data
title_full A scalable software solution for anonymizing high-dimensional biomedical data
title_fullStr A scalable software solution for anonymizing high-dimensional biomedical data
title_full_unstemmed A scalable software solution for anonymizing high-dimensional biomedical data
title_short A scalable software solution for anonymizing high-dimensional biomedical data
title_sort scalable software solution for anonymizing high-dimensional biomedical data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489190/
https://www.ncbi.nlm.nih.gov/pubmed/34605868
http://dx.doi.org/10.1093/gigascience/giab068
work_keys_str_mv AT meurersthierry ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT bildraffael ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT dokieumi ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT prasserfabian ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT meurersthierry scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT bildraffael scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT dokieumi scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata
AT prasserfabian scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata