Cargando…
A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352377/ https://www.ncbi.nlm.nih.gov/pubmed/37460705 http://dx.doi.org/10.1038/s41598-023-38457-3 |
_version_ | 1785074502024036352 |
---|---|
author | Azizi, Zahra Lindner, Simon Shiba, Yumika Raparelli, Valeria Norris, Colleen M. Kublickiene, Karolina Herrero, Maria Trinidad Kautzky-Willer, Alexandra Klimek, Peter Gisinger, Teresa Pilote, Louise El Emam, Khaled |
author_facet | Azizi, Zahra Lindner, Simon Shiba, Yumika Raparelli, Valeria Norris, Colleen M. Kublickiene, Karolina Herrero, Maria Trinidad Kautzky-Willer, Alexandra Klimek, Peter Gisinger, Teresa Pilote, Louise El Emam, Khaled |
author_sort | Azizi, Zahra |
collection | PubMed |
description | Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns. |
format | Online Article Text |
id | pubmed-10352377 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-103523772023-07-19 A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health Azizi, Zahra Lindner, Simon Shiba, Yumika Raparelli, Valeria Norris, Colleen M. Kublickiene, Karolina Herrero, Maria Trinidad Kautzky-Willer, Alexandra Klimek, Peter Gisinger, Teresa Pilote, Louise El Emam, Khaled Sci Rep Article Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns. Nature Publishing Group UK 2023-07-17 /pmc/articles/PMC10352377/ /pubmed/37460705 http://dx.doi.org/10.1038/s41598-023-38457-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Azizi, Zahra Lindner, Simon Shiba, Yumika Raparelli, Valeria Norris, Colleen M. Kublickiene, Karolina Herrero, Maria Trinidad Kautzky-Willer, Alexandra Klimek, Peter Gisinger, Teresa Pilote, Louise El Emam, Khaled A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title | A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title_full | A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title_fullStr | A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title_full_unstemmed | A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title_short | A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
title_sort | comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352377/ https://www.ncbi.nlm.nih.gov/pubmed/37460705 http://dx.doi.org/10.1038/s41598-023-38457-3 |
work_keys_str_mv | AT azizizahra acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT lindnersimon acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT shibayumika acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT raparellivaleria acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT norriscolleenm acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT kublickienekarolina acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT herreromariatrinidad acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT kautzkywilleralexandra acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT klimekpeter acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT gisingerteresa acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT pilotelouise acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT elemamkhaled acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT azizizahra comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT lindnersimon comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT shibayumika comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT raparellivaleria comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT norriscolleenm comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT kublickienekarolina comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT herreromariatrinidad comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT kautzkywilleralexandra comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT klimekpeter comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT gisingerteresa comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT pilotelouise comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth AT elemamkhaled comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth |