Cargando…
A Python library to check the level of anonymity of a dataset
Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymiza...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791635/ https://www.ncbi.nlm.nih.gov/pubmed/36572676 http://dx.doi.org/10.1038/s41597-022-01894-2 |
_version_ | 1784859452005941248 |
---|---|
author | Sáinz-Pardo Díaz, Judith López García, Álvaro |
author_facet | Sáinz-Pardo Díaz, Judith López García, Álvaro |
author_sort | Sáinz-Pardo Díaz, Judith |
collection | PubMed |
description | Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed. |
format | Online Article Text |
id | pubmed-9791635 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-97916352022-12-27 A Python library to check the level of anonymity of a dataset Sáinz-Pardo Díaz, Judith López García, Álvaro Sci Data Article Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed. Nature Publishing Group UK 2022-12-26 /pmc/articles/PMC9791635/ /pubmed/36572676 http://dx.doi.org/10.1038/s41597-022-01894-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Sáinz-Pardo Díaz, Judith López García, Álvaro A Python library to check the level of anonymity of a dataset |
title | A Python library to check the level of anonymity of a dataset |
title_full | A Python library to check the level of anonymity of a dataset |
title_fullStr | A Python library to check the level of anonymity of a dataset |
title_full_unstemmed | A Python library to check the level of anonymity of a dataset |
title_short | A Python library to check the level of anonymity of a dataset |
title_sort | python library to check the level of anonymity of a dataset |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791635/ https://www.ncbi.nlm.nih.gov/pubmed/36572676 http://dx.doi.org/10.1038/s41597-022-01894-2 |
work_keys_str_mv | AT sainzpardodiazjudith apythonlibrarytocheckthelevelofanonymityofadataset AT lopezgarciaalvaro apythonlibrarytocheckthelevelofanonymityofadataset AT sainzpardodiazjudith pythonlibrarytocheckthelevelofanonymityofadataset AT lopezgarciaalvaro pythonlibrarytocheckthelevelofanonymityofadataset |