Cargando…

The risk of re-identification remains high even in country-scale location datasets

Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1...

Descripción completa

Detalles Bibliográficos
Autores principales:	Farzanehfar, Ali, Houssiau, Florimond, de Montjoye, Yves-Alexandre
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961185/ https://www.ncbi.nlm.nih.gov/pubmed/33748793 http://dx.doi.org/10.1016/j.patter.2021.100204

Descripción
Sumario:	Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.

The risk of re-identification remains high even in country-scale location datasets

Ejemplares similares