Cargando…
A scalable software solution for anonymizing high-dimensional biomedical data
BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing c...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489190/ https://www.ncbi.nlm.nih.gov/pubmed/34605868 http://dx.doi.org/10.1093/gigascience/giab068 |
_version_ | 1784578303771803648 |
---|---|
author | Meurers, Thierry Bild, Raffael Do, Kieu-Mi Prasser, Fabian |
author_facet | Meurers, Thierry Bild, Raffael Do, Kieu-Mi Prasser, Fabian |
author_sort | Meurers, Thierry |
collection | PubMed |
description | BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing. |
format | Online Article Text |
id | pubmed-8489190 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84891902021-10-05 A scalable software solution for anonymizing high-dimensional biomedical data Meurers, Thierry Bild, Raffael Do, Kieu-Mi Prasser, Fabian Gigascience Technical Note BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing. Oxford University Press 2021-10-04 /pmc/articles/PMC8489190/ /pubmed/34605868 http://dx.doi.org/10.1093/gigascience/giab068 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Meurers, Thierry Bild, Raffael Do, Kieu-Mi Prasser, Fabian A scalable software solution for anonymizing high-dimensional biomedical data |
title | A scalable software solution for anonymizing high-dimensional biomedical data |
title_full | A scalable software solution for anonymizing high-dimensional biomedical data |
title_fullStr | A scalable software solution for anonymizing high-dimensional biomedical data |
title_full_unstemmed | A scalable software solution for anonymizing high-dimensional biomedical data |
title_short | A scalable software solution for anonymizing high-dimensional biomedical data |
title_sort | scalable software solution for anonymizing high-dimensional biomedical data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8489190/ https://www.ncbi.nlm.nih.gov/pubmed/34605868 http://dx.doi.org/10.1093/gigascience/giab068 |
work_keys_str_mv | AT meurersthierry ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT bildraffael ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT dokieumi ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT prasserfabian ascalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT meurersthierry scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT bildraffael scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT dokieumi scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata AT prasserfabian scalablesoftwaresolutionforanonymizinghighdimensionalbiomedicaldata |