Cargando…

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics

Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not...

Descripción completa

Detalles Bibliográficos
Autores principales: Kidd, Brian, Wang, Kunbo, Xu, Yanxun, Ni, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9782716/
https://www.ncbi.nlm.nih.gov/pubmed/36541002
_version_ 1784857409013940224
author Kidd, Brian
Wang, Kunbo
Xu, Yanxun
Ni, Yang
author_facet Kidd, Brian
Wang, Kunbo
Xu, Yanxun
Ni, Yang
author_sort Kidd, Brian
collection PubMed
description Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks.
format Online
Article
Text
id pubmed-9782716
institution National Center for Biotechnology Information
language English
publishDate 2023
record_format MEDLINE/PubMed
spelling pubmed-97827162023-01-01 Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics Kidd, Brian Wang, Kunbo Xu, Yanxun Ni, Yang Pac Symp Biocomput Article Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks. 2023 /pmc/articles/PMC9782716/ /pubmed/36541002 Text en https://creativecommons.org/licenses/by-nc/4.0/Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Kidd, Brian
Wang, Kunbo
Xu, Yanxun
Ni, Yang
Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title_full Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title_fullStr Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title_full_unstemmed Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title_short Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics
title_sort federated learning for sparse bayesian models with applications to electronic health records and genomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9782716/
https://www.ncbi.nlm.nih.gov/pubmed/36541002
work_keys_str_mv AT kiddbrian federatedlearningforsparsebayesianmodelswithapplicationstoelectronichealthrecordsandgenomics
AT wangkunbo federatedlearningforsparsebayesianmodelswithapplicationstoelectronichealthrecordsandgenomics
AT xuyanxun federatedlearningforsparsebayesianmodelswithapplicationstoelectronichealthrecordsandgenomics
AT niyang federatedlearningforsparsebayesianmodelswithapplicationstoelectronichealthrecordsandgenomics