Cargando…

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff

Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has rec...

Descripción completa

Detalles Bibliográficos
Autores principales: Petti, Samantha, Flaxman, Abraham
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216402/
https://www.ncbi.nlm.nih.gov/pubmed/32478311
http://dx.doi.org/10.12688/gatesopenres.13089.2
_version_ 1783532408865292288
author Petti, Samantha
Flaxman, Abraham
author_facet Petti, Samantha
Flaxman, Abraham
author_sort Petti, Samantha
collection PubMed
description Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion.
format Online
Article
Text
id pubmed-7216402
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-72164022020-05-29 Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff Petti, Samantha Flaxman, Abraham Gates Open Res Research Article Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion. F1000 Research Limited 2020-04-06 /pmc/articles/PMC7216402/ /pubmed/32478311 http://dx.doi.org/10.12688/gatesopenres.13089.2 Text en Copyright: © 2020 Petti S and Flaxman A http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Petti, Samantha
Flaxman, Abraham
Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title_full Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title_fullStr Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title_full_unstemmed Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title_short Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff
title_sort differential privacy in the 2020 us census: what will it do? quantifying the accuracy/privacy tradeoff
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7216402/
https://www.ncbi.nlm.nih.gov/pubmed/32478311
http://dx.doi.org/10.12688/gatesopenres.13089.2
work_keys_str_mv AT pettisamantha differentialprivacyinthe2020uscensuswhatwillitdoquantifyingtheaccuracyprivacytradeoff
AT flaxmanabraham differentialprivacyinthe2020uscensuswhatwillitdoquantifyingtheaccuracyprivacytradeoff