Cargando…
A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data
Training on multiple diverse data sources is critical to ensure unbiased and generalizable AI. In healthcare, data privacy laws prohibit data from being moved outside the country of origin, preventing global medical datasets being centralized for AI training. Data-centric, cross-silo federated learn...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9133021/ https://www.ncbi.nlm.nih.gov/pubmed/35614106 http://dx.doi.org/10.1038/s41598-022-12833-x |
_version_ | 1784713504894222336 |
---|---|
author | Nguyen, T. V. Dakka, M. A. Diakiw, S. M. VerMilyea, M. D. Perugini, M. Hall, J. M. M. Perugini, D. |
author_facet | Nguyen, T. V. Dakka, M. A. Diakiw, S. M. VerMilyea, M. D. Perugini, M. Hall, J. M. M. Perugini, D. |
author_sort | Nguyen, T. V. |
collection | PubMed |
description | Training on multiple diverse data sources is critical to ensure unbiased and generalizable AI. In healthcare, data privacy laws prohibit data from being moved outside the country of origin, preventing global medical datasets being centralized for AI training. Data-centric, cross-silo federated learning represents a pathway forward for training on distributed medical datasets. Existing approaches typically require updates to a training model to be transferred to a central server, potentially breaching data privacy laws unless the updates are sufficiently disguised or abstracted to prevent reconstruction of the dataset. Here we present a completely decentralized federated learning approach, using knowledge distillation, ensuring data privacy and protection. Each node operates independently without needing to access external data. AI accuracy using this approach is found to be comparable to centralized training, and when nodes comprise poor-quality data, which is common in healthcare, AI accuracy can exceed the performance of traditional centralized training. |
format | Online Article Text |
id | pubmed-9133021 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-91330212022-05-27 A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data Nguyen, T. V. Dakka, M. A. Diakiw, S. M. VerMilyea, M. D. Perugini, M. Hall, J. M. M. Perugini, D. Sci Rep Article Training on multiple diverse data sources is critical to ensure unbiased and generalizable AI. In healthcare, data privacy laws prohibit data from being moved outside the country of origin, preventing global medical datasets being centralized for AI training. Data-centric, cross-silo federated learning represents a pathway forward for training on distributed medical datasets. Existing approaches typically require updates to a training model to be transferred to a central server, potentially breaching data privacy laws unless the updates are sufficiently disguised or abstracted to prevent reconstruction of the dataset. Here we present a completely decentralized federated learning approach, using knowledge distillation, ensuring data privacy and protection. Each node operates independently without needing to access external data. AI accuracy using this approach is found to be comparable to centralized training, and when nodes comprise poor-quality data, which is common in healthcare, AI accuracy can exceed the performance of traditional centralized training. Nature Publishing Group UK 2022-05-25 /pmc/articles/PMC9133021/ /pubmed/35614106 http://dx.doi.org/10.1038/s41598-022-12833-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Nguyen, T. V. Dakka, M. A. Diakiw, S. M. VerMilyea, M. D. Perugini, M. Hall, J. M. M. Perugini, D. A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title | A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title_full | A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title_fullStr | A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title_full_unstemmed | A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title_short | A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
title_sort | novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9133021/ https://www.ncbi.nlm.nih.gov/pubmed/35614106 http://dx.doi.org/10.1038/s41598-022-12833-x |
work_keys_str_mv | AT nguyentv anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT dakkama anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT diakiwsm anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT vermilyeamd anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT peruginim anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT halljmm anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT peruginid anoveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT nguyentv noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT dakkama noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT diakiwsm noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT vermilyeamd noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT peruginim noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT halljmm noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata AT peruginid noveldecentralizedfederatedlearningapproachtotrainongloballydistributedpoorqualityandprotectedprivatemedicaldata |