Cargando…

A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health

Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not...

Descripción completa

Detalles Bibliográficos
Autores principales: Azizi, Zahra, Lindner, Simon, Shiba, Yumika, Raparelli, Valeria, Norris, Colleen M., Kublickiene, Karolina, Herrero, Maria Trinidad, Kautzky-Willer, Alexandra, Klimek, Peter, Gisinger, Teresa, Pilote, Louise, El Emam, Khaled
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352377/
https://www.ncbi.nlm.nih.gov/pubmed/37460705
http://dx.doi.org/10.1038/s41598-023-38457-3
_version_ 1785074502024036352
author Azizi, Zahra
Lindner, Simon
Shiba, Yumika
Raparelli, Valeria
Norris, Colleen M.
Kublickiene, Karolina
Herrero, Maria Trinidad
Kautzky-Willer, Alexandra
Klimek, Peter
Gisinger, Teresa
Pilote, Louise
El Emam, Khaled
author_facet Azizi, Zahra
Lindner, Simon
Shiba, Yumika
Raparelli, Valeria
Norris, Colleen M.
Kublickiene, Karolina
Herrero, Maria Trinidad
Kautzky-Willer, Alexandra
Klimek, Peter
Gisinger, Teresa
Pilote, Louise
El Emam, Khaled
author_sort Azizi, Zahra
collection PubMed
description Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns.
format Online
Article
Text
id pubmed-10352377
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-103523772023-07-19 A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health Azizi, Zahra Lindner, Simon Shiba, Yumika Raparelli, Valeria Norris, Colleen M. Kublickiene, Karolina Herrero, Maria Trinidad Kautzky-Willer, Alexandra Klimek, Peter Gisinger, Teresa Pilote, Louise El Emam, Khaled Sci Rep Article Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns. Nature Publishing Group UK 2023-07-17 /pmc/articles/PMC10352377/ /pubmed/37460705 http://dx.doi.org/10.1038/s41598-023-38457-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Azizi, Zahra
Lindner, Simon
Shiba, Yumika
Raparelli, Valeria
Norris, Colleen M.
Kublickiene, Karolina
Herrero, Maria Trinidad
Kautzky-Willer, Alexandra
Klimek, Peter
Gisinger, Teresa
Pilote, Louise
El Emam, Khaled
A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title_full A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title_fullStr A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title_full_unstemmed A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title_short A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
title_sort comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10352377/
https://www.ncbi.nlm.nih.gov/pubmed/37460705
http://dx.doi.org/10.1038/s41598-023-38457-3
work_keys_str_mv AT azizizahra acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT lindnersimon acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT shibayumika acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT raparellivaleria acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT norriscolleenm acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT kublickienekarolina acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT herreromariatrinidad acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT kautzkywilleralexandra acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT klimekpeter acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT gisingerteresa acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT pilotelouise acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT elemamkhaled acomparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT azizizahra comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT lindnersimon comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT shibayumika comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT raparellivaleria comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT norriscolleenm comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT kublickienekarolina comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT herreromariatrinidad comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT kautzkywilleralexandra comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT klimekpeter comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT gisingerteresa comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT pilotelouise comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth
AT elemamkhaled comparisonofsyntheticdatagenerationandfederatedanalysisforenablinginternationalevaluationsofcardiovascularhealth