Cargando…

Privacy-first health research with federated learning

Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Sadilek, Adam, Liu, Luyang, Nguyen, Dung, Kamruzzaman, Methun, Serghiou, Stylianos, Rader, Benjamin, Ingerman, Alex, Mellem, Stefan, Kairouz, Peter, Nsoesie, Elaine O., MacFarlane, Jamie, Vullikanti, Anil, Marathe, Madhav, Eastham, Paul, Brownstein, John S., Arcas, Blaise Aguera y., Howell, Michael D., Hernandez, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423792/
https://www.ncbi.nlm.nih.gov/pubmed/34493770
http://dx.doi.org/10.1038/s41746-021-00489-2
_version_ 1783749542337839104
author Sadilek, Adam
Liu, Luyang
Nguyen, Dung
Kamruzzaman, Methun
Serghiou, Stylianos
Rader, Benjamin
Ingerman, Alex
Mellem, Stefan
Kairouz, Peter
Nsoesie, Elaine O.
MacFarlane, Jamie
Vullikanti, Anil
Marathe, Madhav
Eastham, Paul
Brownstein, John S.
Arcas, Blaise Aguera y.
Howell, Michael D.
Hernandez, John
author_facet Sadilek, Adam
Liu, Luyang
Nguyen, Dung
Kamruzzaman, Methun
Serghiou, Stylianos
Rader, Benjamin
Ingerman, Alex
Mellem, Stefan
Kairouz, Peter
Nsoesie, Elaine O.
MacFarlane, Jamie
Vullikanti, Anil
Marathe, Madhav
Eastham, Paul
Brownstein, John S.
Arcas, Blaise Aguera y.
Howell, Michael D.
Hernandez, John
author_sort Sadilek, Adam
collection PubMed
description Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show—on a diverse set of single and multi-site health studies—that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research—across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science—aspects that used to be at odds with each other.
format Online
Article
Text
id pubmed-8423792
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84237922021-09-14 Privacy-first health research with federated learning Sadilek, Adam Liu, Luyang Nguyen, Dung Kamruzzaman, Methun Serghiou, Stylianos Rader, Benjamin Ingerman, Alex Mellem, Stefan Kairouz, Peter Nsoesie, Elaine O. MacFarlane, Jamie Vullikanti, Anil Marathe, Madhav Eastham, Paul Brownstein, John S. Arcas, Blaise Aguera y. Howell, Michael D. Hernandez, John NPJ Digit Med Article Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show—on a diverse set of single and multi-site health studies—that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research—across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science—aspects that used to be at odds with each other. Nature Publishing Group UK 2021-09-07 /pmc/articles/PMC8423792/ /pubmed/34493770 http://dx.doi.org/10.1038/s41746-021-00489-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sadilek, Adam
Liu, Luyang
Nguyen, Dung
Kamruzzaman, Methun
Serghiou, Stylianos
Rader, Benjamin
Ingerman, Alex
Mellem, Stefan
Kairouz, Peter
Nsoesie, Elaine O.
MacFarlane, Jamie
Vullikanti, Anil
Marathe, Madhav
Eastham, Paul
Brownstein, John S.
Arcas, Blaise Aguera y.
Howell, Michael D.
Hernandez, John
Privacy-first health research with federated learning
title Privacy-first health research with federated learning
title_full Privacy-first health research with federated learning
title_fullStr Privacy-first health research with federated learning
title_full_unstemmed Privacy-first health research with federated learning
title_short Privacy-first health research with federated learning
title_sort privacy-first health research with federated learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423792/
https://www.ncbi.nlm.nih.gov/pubmed/34493770
http://dx.doi.org/10.1038/s41746-021-00489-2
work_keys_str_mv AT sadilekadam privacyfirsthealthresearchwithfederatedlearning
AT liuluyang privacyfirsthealthresearchwithfederatedlearning
AT nguyendung privacyfirsthealthresearchwithfederatedlearning
AT kamruzzamanmethun privacyfirsthealthresearchwithfederatedlearning
AT serghioustylianos privacyfirsthealthresearchwithfederatedlearning
AT raderbenjamin privacyfirsthealthresearchwithfederatedlearning
AT ingermanalex privacyfirsthealthresearchwithfederatedlearning
AT mellemstefan privacyfirsthealthresearchwithfederatedlearning
AT kairouzpeter privacyfirsthealthresearchwithfederatedlearning
AT nsoesieelaineo privacyfirsthealthresearchwithfederatedlearning
AT macfarlanejamie privacyfirsthealthresearchwithfederatedlearning
AT vullikantianil privacyfirsthealthresearchwithfederatedlearning
AT marathemadhav privacyfirsthealthresearchwithfederatedlearning
AT easthampaul privacyfirsthealthresearchwithfederatedlearning
AT brownsteinjohns privacyfirsthealthresearchwithfederatedlearning
AT arcasblaiseagueray privacyfirsthealthresearchwithfederatedlearning
AT howellmichaeld privacyfirsthealthresearchwithfederatedlearning
AT hernandezjohn privacyfirsthealthresearchwithfederatedlearning