Cargando…

Privacy and uniqueness of neighborhoods in social networks

The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Romanini, Daniele, Lehmann, Sune, Kivelä, Mikko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505500/ https://www.ncbi.nlm.nih.gov/pubmed/34635678 http://dx.doi.org/10.1038/s41598-021-94283-5

_version_	1784581546590601216
author	Romanini, Daniele Lehmann, Sune Kivelä, Mikko
author_facet	Romanini, Daniele Lehmann, Sune Kivelä, Mikko
author_sort	Romanini, Daniele
collection	PubMed
description	The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes’ attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. While various optimization algorithms have been proposed to anonymize networks, there is still only a limited theoretical understanding of which network features are important for the privacy problem. Using network models and real data, we show that the average degree of networks is a crucial parameter for the severity of re-identification risk from nodes’ neighborhoods. Dense networks are more at risk, and, apart from a small band of average degree values, either almost all nodes are uniquely re-identifiable or they are all safe. Our results allow researchers to assess the privacy risk based on a small number of network statistics which are available even before the data is collected. As a rule-of-thumb, the privacy risks are high if the average degree is above 10. Guided by these results, we explore sampling of edges as a strategy to mitigate the re-identification risk of nodes. This approach can be implemented during the data collection phase, and its effect on various network measures can be estimated and corrected using sampling theory. The new understanding of the uniqueness of neighborhoods in networks presented in this work can support the development of privacy-aware ways of designing network data collection procedures, anonymization methods, and sharing network data.
format	Online Article Text
id	pubmed-8505500
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-85055002021-10-13 Privacy and uniqueness of neighborhoods in social networks Romanini, Daniele Lehmann, Sune Kivelä, Mikko Sci Rep Article The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes’ attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. While various optimization algorithms have been proposed to anonymize networks, there is still only a limited theoretical understanding of which network features are important for the privacy problem. Using network models and real data, we show that the average degree of networks is a crucial parameter for the severity of re-identification risk from nodes’ neighborhoods. Dense networks are more at risk, and, apart from a small band of average degree values, either almost all nodes are uniquely re-identifiable or they are all safe. Our results allow researchers to assess the privacy risk based on a small number of network statistics which are available even before the data is collected. As a rule-of-thumb, the privacy risks are high if the average degree is above 10. Guided by these results, we explore sampling of edges as a strategy to mitigate the re-identification risk of nodes. This approach can be implemented during the data collection phase, and its effect on various network measures can be estimated and corrected using sampling theory. The new understanding of the uniqueness of neighborhoods in networks presented in this work can support the development of privacy-aware ways of designing network data collection procedures, anonymization methods, and sharing network data. Nature Publishing Group UK 2021-10-11 /pmc/articles/PMC8505500/ /pubmed/34635678 http://dx.doi.org/10.1038/s41598-021-94283-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Romanini, Daniele Lehmann, Sune Kivelä, Mikko Privacy and uniqueness of neighborhoods in social networks
title	Privacy and uniqueness of neighborhoods in social networks
title_full	Privacy and uniqueness of neighborhoods in social networks
title_fullStr	Privacy and uniqueness of neighborhoods in social networks
title_full_unstemmed	Privacy and uniqueness of neighborhoods in social networks
title_short	Privacy and uniqueness of neighborhoods in social networks
title_sort	privacy and uniqueness of neighborhoods in social networks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505500/ https://www.ncbi.nlm.nih.gov/pubmed/34635678 http://dx.doi.org/10.1038/s41598-021-94283-5
work_keys_str_mv	AT romaninidaniele privacyanduniquenessofneighborhoodsinsocialnetworks AT lehmannsune privacyanduniquenessofneighborhoodsinsocialnetworks AT kivelamikko privacyanduniquenessofneighborhoodsinsocialnetworks

Privacy and uniqueness of neighborhoods in social networks

Ejemplares similares