Cargando…

Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation

BACKGROUND: A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will c...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Fang, Wang, Dong, Yan, Tian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198035/
https://www.ncbi.nlm.nih.gov/pubmed/37208606
http://dx.doi.org/10.1186/s12874-023-01927-3
_version_ 1785044663593336832
author Liu, Fang
Wang, Dong
Yan, Tian
author_facet Liu, Fang
Wang, Dong
Yan, Tian
author_sort Liu, Fang
collection PubMed
description BACKGROUND: A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns. METHODS: We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply. RESULTS: The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized data can be similar to the original results at a reasonably small privacy loss ([Formula: see text] ). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When [Formula: see text]  and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints. CONCLUSIONS: Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01927-3.
format Online
Article
Text
id pubmed-10198035
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101980352023-05-21 Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation Liu, Fang Wang, Dong Yan, Tian BMC Med Res Methodol Research BACKGROUND: A considerable amount of various types of data have been collected during the COVID-19 pandemic, the analysis and understanding of which have been indispensable for curbing the spread of the disease. As the pandemic moves to an endemic state, the data collected during the pandemic will continue to be rich sources for further studying and understanding the impacts of the pandemic on various aspects of our society. On the other hand, naïve release and sharing of the information can be associated with serious privacy concerns. METHODS: We use three common but distinct data types collected during the pandemic (case surveillance tabular data, case location data, and contact tracing networks) to illustrate the publication and sharing of granular information and individual-level pandemic data in a privacy-preserving manner. We leverage and build upon the concept of differential privacy to generate and release privacy-preserving data for each data type. We investigate the inferential utility of privacy-preserving information through simulation studies at different levels of privacy guarantees and demonstrate the approaches in real-life data. All the approaches employed in the study are straightforward to apply. RESULTS: The empirical studies in all three data cases suggest that privacy-preserving results based on the differentially privately sanitized data can be similar to the original results at a reasonably small privacy loss ([Formula: see text] ). Statistical inferences based on sanitized data using the multiple synthesis technique also appear valid, with nominal coverage of 95% confidence intervals when there is no noticeable bias in point estimation. When [Formula: see text]  and the sample size is not large enough, some privacy-preserving results are subject to bias, partially due to the bounding applied to sanitized data as a post-processing step to satisfy practical data constraints. CONCLUSIONS: Our study generates statistical evidence on the practical feasibility of sharing pandemic data with privacy guarantees and on how to balance the statistical utility of released information during this process. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01927-3. BioMed Central 2023-05-19 /pmc/articles/PMC10198035/ /pubmed/37208606 http://dx.doi.org/10.1186/s12874-023-01927-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Fang
Wang, Dong
Yan, Tian
Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title_full Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title_fullStr Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title_full_unstemmed Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title_short Some examples of privacy-preserving sharing of COVID-19 pandemic data with statistical utility evaluation
title_sort some examples of privacy-preserving sharing of covid-19 pandemic data with statistical utility evaluation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198035/
https://www.ncbi.nlm.nih.gov/pubmed/37208606
http://dx.doi.org/10.1186/s12874-023-01927-3
work_keys_str_mv AT liufang someexamplesofprivacypreservingsharingofcovid19pandemicdatawithstatisticalutilityevaluation
AT wangdong someexamplesofprivacypreservingsharingofcovid19pandemicdatawithstatisticalutilityevaluation
AT yantian someexamplesofprivacypreservingsharingofcovid19pandemicdatawithstatisticalutilityevaluation