Cargando…

Privacy preserving identification of population stratification for collaborative genomic research

The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborati...

Descripción completa

Detalles Bibliográficos
Autores principales: Dervishi, Leonard, Li, Wenbiao, Halimi, Anisa, Jiang, Xiaoqian, Vaidya, Jaideep, Ayday, Erman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311306/
https://www.ncbi.nlm.nih.gov/pubmed/37387172
http://dx.doi.org/10.1093/bioinformatics/btad274
_version_ 1785066715120402432
author Dervishi, Leonard
Li, Wenbiao
Halimi, Anisa
Jiang, Xiaoqian
Vaidya, Jaideep
Ayday, Erman
author_facet Dervishi, Leonard
Li, Wenbiao
Halimi, Anisa
Jiang, Xiaoqian
Vaidya, Jaideep
Ayday, Erman
author_sort Dervishi, Leonard
collection PubMed
description The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators’ datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.
format Online
Article
Text
id pubmed-10311306
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113062023-07-01 Privacy preserving identification of population stratification for collaborative genomic research Dervishi, Leonard Li, Wenbiao Halimi, Anisa Jiang, Xiaoqian Vaidya, Jaideep Ayday, Erman Bioinformatics Genome Privacy and Security The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators’ datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants. Oxford University Press 2023-06-30 /pmc/articles/PMC10311306/ /pubmed/37387172 http://dx.doi.org/10.1093/bioinformatics/btad274 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Privacy and Security
Dervishi, Leonard
Li, Wenbiao
Halimi, Anisa
Jiang, Xiaoqian
Vaidya, Jaideep
Ayday, Erman
Privacy preserving identification of population stratification for collaborative genomic research
title Privacy preserving identification of population stratification for collaborative genomic research
title_full Privacy preserving identification of population stratification for collaborative genomic research
title_fullStr Privacy preserving identification of population stratification for collaborative genomic research
title_full_unstemmed Privacy preserving identification of population stratification for collaborative genomic research
title_short Privacy preserving identification of population stratification for collaborative genomic research
title_sort privacy preserving identification of population stratification for collaborative genomic research
topic Genome Privacy and Security
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311306/
https://www.ncbi.nlm.nih.gov/pubmed/37387172
http://dx.doi.org/10.1093/bioinformatics/btad274
work_keys_str_mv AT dervishileonard privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch
AT liwenbiao privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch
AT halimianisa privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch
AT jiangxiaoqian privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch
AT vaidyajaideep privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch
AT aydayerman privacypreservingidentificationofpopulationstratificationforcollaborativegenomicresearch