Cargando…

Privacy-preserving federated genome-wide association studies via dynamic sampling

MOTIVATION: Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xinyue, Dervishi, Leonard, Li, Wentao, Ayday, Erman, Jiang, Xiaoqian, Vaidya, Jaideep
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10612407/
https://www.ncbi.nlm.nih.gov/pubmed/37856329
http://dx.doi.org/10.1093/bioinformatics/btad639
_version_ 1785128697860194304
author Wang, Xinyue
Dervishi, Leonard
Li, Wentao
Ayday, Erman
Jiang, Xiaoqian
Vaidya, Jaideep
author_facet Wang, Xinyue
Dervishi, Leonard
Li, Wentao
Ayday, Erman
Jiang, Xiaoqian
Vaidya, Jaideep
author_sort Wang, Xinyue
collection PubMed
description MOTIVATION: Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. RESULTS: This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/amioamo/TDS.
format Online
Article
Text
id pubmed-10612407
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106124072023-10-29 Privacy-preserving federated genome-wide association studies via dynamic sampling Wang, Xinyue Dervishi, Leonard Li, Wentao Ayday, Erman Jiang, Xiaoqian Vaidya, Jaideep Bioinformatics Original Paper MOTIVATION: Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. RESULTS: This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/amioamo/TDS. Oxford University Press 2023-10-19 /pmc/articles/PMC10612407/ /pubmed/37856329 http://dx.doi.org/10.1093/bioinformatics/btad639 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Wang, Xinyue
Dervishi, Leonard
Li, Wentao
Ayday, Erman
Jiang, Xiaoqian
Vaidya, Jaideep
Privacy-preserving federated genome-wide association studies via dynamic sampling
title Privacy-preserving federated genome-wide association studies via dynamic sampling
title_full Privacy-preserving federated genome-wide association studies via dynamic sampling
title_fullStr Privacy-preserving federated genome-wide association studies via dynamic sampling
title_full_unstemmed Privacy-preserving federated genome-wide association studies via dynamic sampling
title_short Privacy-preserving federated genome-wide association studies via dynamic sampling
title_sort privacy-preserving federated genome-wide association studies via dynamic sampling
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10612407/
https://www.ncbi.nlm.nih.gov/pubmed/37856329
http://dx.doi.org/10.1093/bioinformatics/btad639
work_keys_str_mv AT wangxinyue privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling
AT dervishileonard privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling
AT liwentao privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling
AT aydayerman privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling
AT jiangxiaoqian privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling
AT vaidyajaideep privacypreservingfederatedgenomewideassociationstudiesviadynamicsampling