Cargando…

HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis

In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models...

Descripción completa

Detalles Bibliográficos
Autores principales: Duan, Shaoming, Liu, Chuanyi, Han, Peiyi, Jin, Xiaopeng, Zhang, Xinyi, He, Tianyu, Pan, Hezhong, Xiang, Xiayu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387/
https://www.ncbi.nlm.nih.gov/pubmed/36673229
http://dx.doi.org/10.3390/e25010088
_version_ 1784874086463176704
author Duan, Shaoming
Liu, Chuanyi
Han, Peiyi
Jin, Xiaopeng
Zhang, Xinyi
He, Tianyu
Pan, Hezhong
Xiang, Xiayu
author_facet Duan, Shaoming
Liu, Chuanyi
Han, Peiyi
Jin, Xiaopeng
Zhang, Xinyi
He, Tianyu
Pan, Hezhong
Xiang, Xiayu
author_sort Duan, Shaoming
collection PubMed
description In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.
format Online
Article
Text
id pubmed-9858387
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98583872023-01-21 HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis Duan, Shaoming Liu, Chuanyi Han, Peiyi Jin, Xiaopeng Zhang, Xinyi He, Tianyu Pan, Hezhong Xiang, Xiayu Entropy (Basel) Article In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks. MDPI 2022-12-31 /pmc/articles/PMC9858387/ /pubmed/36673229 http://dx.doi.org/10.3390/e25010088 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Duan, Shaoming
Liu, Chuanyi
Han, Peiyi
Jin, Xiaopeng
Zhang, Xinyi
He, Tianyu
Pan, Hezhong
Xiang, Xiayu
HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title_full HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title_fullStr HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title_full_unstemmed HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title_short HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
title_sort ht-fed-gan: federated generative model for decentralized tabular data synthesis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387/
https://www.ncbi.nlm.nih.gov/pubmed/36673229
http://dx.doi.org/10.3390/e25010088
work_keys_str_mv AT duanshaoming htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT liuchuanyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT hanpeiyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT jinxiaopeng htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT zhangxinyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT hetianyu htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT panhezhong htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis
AT xiangxiayu htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis