Cargando…
HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis
In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387/ https://www.ncbi.nlm.nih.gov/pubmed/36673229 http://dx.doi.org/10.3390/e25010088 |
_version_ | 1784874086463176704 |
---|---|
author | Duan, Shaoming Liu, Chuanyi Han, Peiyi Jin, Xiaopeng Zhang, Xinyi He, Tianyu Pan, Hezhong Xiang, Xiayu |
author_facet | Duan, Shaoming Liu, Chuanyi Han, Peiyi Jin, Xiaopeng Zhang, Xinyi He, Tianyu Pan, Hezhong Xiang, Xiayu |
author_sort | Duan, Shaoming |
collection | PubMed |
description | In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks. |
format | Online Article Text |
id | pubmed-9858387 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98583872023-01-21 HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis Duan, Shaoming Liu, Chuanyi Han, Peiyi Jin, Xiaopeng Zhang, Xinyi He, Tianyu Pan, Hezhong Xiang, Xiayu Entropy (Basel) Article In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks. MDPI 2022-12-31 /pmc/articles/PMC9858387/ /pubmed/36673229 http://dx.doi.org/10.3390/e25010088 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Duan, Shaoming Liu, Chuanyi Han, Peiyi Jin, Xiaopeng Zhang, Xinyi He, Tianyu Pan, Hezhong Xiang, Xiayu HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title | HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title_full | HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title_fullStr | HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title_full_unstemmed | HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title_short | HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis |
title_sort | ht-fed-gan: federated generative model for decentralized tabular data synthesis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858387/ https://www.ncbi.nlm.nih.gov/pubmed/36673229 http://dx.doi.org/10.3390/e25010088 |
work_keys_str_mv | AT duanshaoming htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT liuchuanyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT hanpeiyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT jinxiaopeng htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT zhangxinyi htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT hetianyu htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT panhezhong htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis AT xiangxiayu htfedganfederatedgenerativemodelfordecentralizedtabulardatasynthesis |