Cargando…

The risk of re-identification remains high even in country-scale location datasets

Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1...

Descripción completa

Detalles Bibliográficos
Autores principales: Farzanehfar, Ali, Houssiau, Florimond, de Montjoye, Yves-Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961185/
https://www.ncbi.nlm.nih.gov/pubmed/33748793
http://dx.doi.org/10.1016/j.patter.2021.100204
_version_ 1783665204245037056
author Farzanehfar, Ali
Houssiau, Florimond
de Montjoye, Yves-Alexandre
author_facet Farzanehfar, Ali
Houssiau, Florimond
de Montjoye, Yves-Alexandre
author_sort Farzanehfar, Ali
collection PubMed
description Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.
format Online
Article
Text
id pubmed-7961185
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-79611852021-03-19 The risk of re-identification remains high even in country-scale location datasets Farzanehfar, Ali Houssiau, Florimond de Montjoye, Yves-Alexandre Patterns (N Y) Article Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets. Elsevier 2021-03-12 /pmc/articles/PMC7961185/ /pubmed/33748793 http://dx.doi.org/10.1016/j.patter.2021.100204 Text en © 2021 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Farzanehfar, Ali
Houssiau, Florimond
de Montjoye, Yves-Alexandre
The risk of re-identification remains high even in country-scale location datasets
title The risk of re-identification remains high even in country-scale location datasets
title_full The risk of re-identification remains high even in country-scale location datasets
title_fullStr The risk of re-identification remains high even in country-scale location datasets
title_full_unstemmed The risk of re-identification remains high even in country-scale location datasets
title_short The risk of re-identification remains high even in country-scale location datasets
title_sort risk of re-identification remains high even in country-scale location datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961185/
https://www.ncbi.nlm.nih.gov/pubmed/33748793
http://dx.doi.org/10.1016/j.patter.2021.100204
work_keys_str_mv AT farzanehfarali theriskofreidentificationremainshighevenincountryscalelocationdatasets
AT houssiauflorimond theriskofreidentificationremainshighevenincountryscalelocationdatasets
AT demontjoyeyvesalexandre theriskofreidentificationremainshighevenincountryscalelocationdatasets
AT farzanehfarali riskofreidentificationremainshighevenincountryscalelocationdatasets
AT houssiauflorimond riskofreidentificationremainshighevenincountryscalelocationdatasets
AT demontjoyeyvesalexandre riskofreidentificationremainshighevenincountryscalelocationdatasets