Cargando…

Predicting age and gender from network telemetry: Implications for privacy and impact on policy

The systematic monitoring of private communications through the use of information technology pervades the digital age. One result of this is the potential availability of vast amount of data tracking the characteristics of mobile network users. Such data is becoming increasingly accessible for comm...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuang, Lida, Pobbathi, Samruda, Mansury, Yuri, Shapiro, Matthew A., Gurbani, Vijay K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302812/
https://www.ncbi.nlm.nih.gov/pubmed/35862447
http://dx.doi.org/10.1371/journal.pone.0271714
_version_ 1784751716067966976
author Kuang, Lida
Pobbathi, Samruda
Mansury, Yuri
Shapiro, Matthew A.
Gurbani, Vijay K.
author_facet Kuang, Lida
Pobbathi, Samruda
Mansury, Yuri
Shapiro, Matthew A.
Gurbani, Vijay K.
author_sort Kuang, Lida
collection PubMed
description The systematic monitoring of private communications through the use of information technology pervades the digital age. One result of this is the potential availability of vast amount of data tracking the characteristics of mobile network users. Such data is becoming increasingly accessible for commercial use, while the accessibility of such data raises questions about the degree to which personal information can be protected. Existing regulations may require the removal of personally-identifiable information (PII) from datasets before they can be processed, but research now suggests that powerful machine learning classification methods are capable of targeting individuals for personalized marketing purposes, even in the absence of PII. This study aims to demonstrate how machine learning methods can be deployed to extract demographic characteristics. Specifically, we investigate whether key demographics—gender and age—of mobile users can be accurately identified by third parties using deep learning techniques based solely on observations of the user’s interactions within the network. Using an anonymized dataset from a Latin American country, we show the relative ease by which PII in terms of the age and gender demographics can be inferred; specifically, our neural networks model generates an estimate for gender with an accuracy rate of 67%, outperforming decision tree, random forest, and gradient boosting models by a significant margin. Neural networks achieve an even higher accuracy rate of 78% in predicting the subscriber age. These results suggest the need for a more robust regulatory framework governing the collection of personal data to safeguard users from predatory practices motivated by fraudulent intentions, prejudices, or consumer manipulation. We discuss in particular how advances in machine learning have chiseled away a number of General Data Protection Regulation (GDPR) articles designed to protect consumers from the imminent threat of privacy violations.
format Online
Article
Text
id pubmed-9302812
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-93028122022-07-22 Predicting age and gender from network telemetry: Implications for privacy and impact on policy Kuang, Lida Pobbathi, Samruda Mansury, Yuri Shapiro, Matthew A. Gurbani, Vijay K. PLoS One Research Article The systematic monitoring of private communications through the use of information technology pervades the digital age. One result of this is the potential availability of vast amount of data tracking the characteristics of mobile network users. Such data is becoming increasingly accessible for commercial use, while the accessibility of such data raises questions about the degree to which personal information can be protected. Existing regulations may require the removal of personally-identifiable information (PII) from datasets before they can be processed, but research now suggests that powerful machine learning classification methods are capable of targeting individuals for personalized marketing purposes, even in the absence of PII. This study aims to demonstrate how machine learning methods can be deployed to extract demographic characteristics. Specifically, we investigate whether key demographics—gender and age—of mobile users can be accurately identified by third parties using deep learning techniques based solely on observations of the user’s interactions within the network. Using an anonymized dataset from a Latin American country, we show the relative ease by which PII in terms of the age and gender demographics can be inferred; specifically, our neural networks model generates an estimate for gender with an accuracy rate of 67%, outperforming decision tree, random forest, and gradient boosting models by a significant margin. Neural networks achieve an even higher accuracy rate of 78% in predicting the subscriber age. These results suggest the need for a more robust regulatory framework governing the collection of personal data to safeguard users from predatory practices motivated by fraudulent intentions, prejudices, or consumer manipulation. We discuss in particular how advances in machine learning have chiseled away a number of General Data Protection Regulation (GDPR) articles designed to protect consumers from the imminent threat of privacy violations. Public Library of Science 2022-07-21 /pmc/articles/PMC9302812/ /pubmed/35862447 http://dx.doi.org/10.1371/journal.pone.0271714 Text en © 2022 Kuang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kuang, Lida
Pobbathi, Samruda
Mansury, Yuri
Shapiro, Matthew A.
Gurbani, Vijay K.
Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title_full Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title_fullStr Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title_full_unstemmed Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title_short Predicting age and gender from network telemetry: Implications for privacy and impact on policy
title_sort predicting age and gender from network telemetry: implications for privacy and impact on policy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302812/
https://www.ncbi.nlm.nih.gov/pubmed/35862447
http://dx.doi.org/10.1371/journal.pone.0271714
work_keys_str_mv AT kuanglida predictingageandgenderfromnetworktelemetryimplicationsforprivacyandimpactonpolicy
AT pobbathisamruda predictingageandgenderfromnetworktelemetryimplicationsforprivacyandimpactonpolicy
AT mansuryyuri predictingageandgenderfromnetworktelemetryimplicationsforprivacyandimpactonpolicy
AT shapiromatthewa predictingageandgenderfromnetworktelemetryimplicationsforprivacyandimpactonpolicy
AT gurbanivijayk predictingageandgenderfromnetworktelemetryimplicationsforprivacyandimpactonpolicy