Cargando…
The risk of re-identification remains high even in country-scale location datasets
Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961185/ https://www.ncbi.nlm.nih.gov/pubmed/33748793 http://dx.doi.org/10.1016/j.patter.2021.100204 |
_version_ | 1783665204245037056 |
---|---|
author | Farzanehfar, Ali Houssiau, Florimond de Montjoye, Yves-Alexandre |
author_facet | Farzanehfar, Ali Houssiau, Florimond de Montjoye, Yves-Alexandre |
author_sort | Farzanehfar, Ali |
collection | PubMed |
description | Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets. |
format | Online Article Text |
id | pubmed-7961185 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-79611852021-03-19 The risk of re-identification remains high even in country-scale location datasets Farzanehfar, Ali Houssiau, Florimond de Montjoye, Yves-Alexandre Patterns (N Y) Article Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets. Elsevier 2021-03-12 /pmc/articles/PMC7961185/ /pubmed/33748793 http://dx.doi.org/10.1016/j.patter.2021.100204 Text en © 2021 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Farzanehfar, Ali Houssiau, Florimond de Montjoye, Yves-Alexandre The risk of re-identification remains high even in country-scale location datasets |
title | The risk of re-identification remains high even in country-scale location datasets |
title_full | The risk of re-identification remains high even in country-scale location datasets |
title_fullStr | The risk of re-identification remains high even in country-scale location datasets |
title_full_unstemmed | The risk of re-identification remains high even in country-scale location datasets |
title_short | The risk of re-identification remains high even in country-scale location datasets |
title_sort | risk of re-identification remains high even in country-scale location datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961185/ https://www.ncbi.nlm.nih.gov/pubmed/33748793 http://dx.doi.org/10.1016/j.patter.2021.100204 |
work_keys_str_mv | AT farzanehfarali theriskofreidentificationremainshighevenincountryscalelocationdatasets AT houssiauflorimond theriskofreidentificationremainshighevenincountryscalelocationdatasets AT demontjoyeyvesalexandre theriskofreidentificationremainshighevenincountryscalelocationdatasets AT farzanehfarali riskofreidentificationremainshighevenincountryscalelocationdatasets AT houssiauflorimond riskofreidentificationremainshighevenincountryscalelocationdatasets AT demontjoyeyvesalexandre riskofreidentificationremainshighevenincountryscalelocationdatasets |