Cargando…

A Python library to check the level of anonymity of a dataset

Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymiza...

Descripción completa

Detalles Bibliográficos
Autores principales: Sáinz-Pardo Díaz, Judith, López García, Álvaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791635/
https://www.ncbi.nlm.nih.gov/pubmed/36572676
http://dx.doi.org/10.1038/s41597-022-01894-2
_version_ 1784859452005941248
author Sáinz-Pardo Díaz, Judith
López García, Álvaro
author_facet Sáinz-Pardo Díaz, Judith
López García, Álvaro
author_sort Sáinz-Pardo Díaz, Judith
collection PubMed
description Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.
format Online
Article
Text
id pubmed-9791635
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-97916352022-12-27 A Python library to check the level of anonymity of a dataset Sáinz-Pardo Díaz, Judith López García, Álvaro Sci Data Article Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed. Nature Publishing Group UK 2022-12-26 /pmc/articles/PMC9791635/ /pubmed/36572676 http://dx.doi.org/10.1038/s41597-022-01894-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sáinz-Pardo Díaz, Judith
López García, Álvaro
A Python library to check the level of anonymity of a dataset
title A Python library to check the level of anonymity of a dataset
title_full A Python library to check the level of anonymity of a dataset
title_fullStr A Python library to check the level of anonymity of a dataset
title_full_unstemmed A Python library to check the level of anonymity of a dataset
title_short A Python library to check the level of anonymity of a dataset
title_sort python library to check the level of anonymity of a dataset
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791635/
https://www.ncbi.nlm.nih.gov/pubmed/36572676
http://dx.doi.org/10.1038/s41597-022-01894-2
work_keys_str_mv AT sainzpardodiazjudith apythonlibrarytocheckthelevelofanonymityofadataset
AT lopezgarciaalvaro apythonlibrarytocheckthelevelofanonymityofadataset
AT sainzpardodiazjudith pythonlibrarytocheckthelevelofanonymityofadataset
AT lopezgarciaalvaro pythonlibrarytocheckthelevelofanonymityofadataset