Cargando…

A Python library to check the level of anonymity of a dataset

Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymiza...

Descripción completa

Detalles Bibliográficos
Autores principales: Sáinz-Pardo Díaz, Judith, López García, Álvaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791635/
https://www.ncbi.nlm.nih.gov/pubmed/36572676
http://dx.doi.org/10.1038/s41597-022-01894-2
Descripción
Sumario:Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.