Cargando…

Haplotype-based membership inference from summary genomic data

MOTIVATION: The availability of human genomic data, together with the enhanced capacity to process them, is leading to transformative technological advances in biomedical science and engineering. However, the public dissemination of such data has been difficult due to privacy concerns. Specifically,...

Descripción completa

Detalles Bibliográficos
Autores principales: Bu, Diyue, Wang, Xiaofeng, Tang, Haixu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275351/
https://www.ncbi.nlm.nih.gov/pubmed/34252973
http://dx.doi.org/10.1093/bioinformatics/btab305
_version_ 1783721696177422336
author Bu, Diyue
Wang, Xiaofeng
Tang, Haixu
author_facet Bu, Diyue
Wang, Xiaofeng
Tang, Haixu
author_sort Bu, Diyue
collection PubMed
description MOTIVATION: The availability of human genomic data, together with the enhanced capacity to process them, is leading to transformative technological advances in biomedical science and engineering. However, the public dissemination of such data has been difficult due to privacy concerns. Specifically, it has been shown that the presence of a human subject in a case group can be inferred from the shared summary statistics of the group, e.g. the allele frequencies, or even the presence/absence of genetic variants (e.g. shared by the Beacon project) in the group. These methods rely on the availability of the target’s genome, i.e. the DNA profile of a target human subject, and thus are often referred to as the membership inference method. RESULTS: In this article, we demonstrate the haplotypes, i.e. the sequence of single nucleotide variations (SNVs) showing strong genetic linkages in human genome databases, may be inferred from the summary of genomic data without using a target’s genome. Furthermore, novel haplotypes that did not appear in the database may be reconstructed solely from the allele frequencies from genomic datasets. These reconstructed haplotypes can be used for a haplotype-based membership inference algorithm to identify target subjects in a case group with greater power than existing methods based on SNVs. AVAILABILITY AND IMPLEMENTATION: The implementation of the membership inference algorithms is available at https://github.com/diybu/Haplotype-based-membership-inferences.
format Online
Article
Text
id pubmed-8275351
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82753512021-07-13 Haplotype-based membership inference from summary genomic data Bu, Diyue Wang, Xiaofeng Tang, Haixu Bioinformatics Genome Privacy and Security MOTIVATION: The availability of human genomic data, together with the enhanced capacity to process them, is leading to transformative technological advances in biomedical science and engineering. However, the public dissemination of such data has been difficult due to privacy concerns. Specifically, it has been shown that the presence of a human subject in a case group can be inferred from the shared summary statistics of the group, e.g. the allele frequencies, or even the presence/absence of genetic variants (e.g. shared by the Beacon project) in the group. These methods rely on the availability of the target’s genome, i.e. the DNA profile of a target human subject, and thus are often referred to as the membership inference method. RESULTS: In this article, we demonstrate the haplotypes, i.e. the sequence of single nucleotide variations (SNVs) showing strong genetic linkages in human genome databases, may be inferred from the summary of genomic data without using a target’s genome. Furthermore, novel haplotypes that did not appear in the database may be reconstructed solely from the allele frequencies from genomic datasets. These reconstructed haplotypes can be used for a haplotype-based membership inference algorithm to identify target subjects in a case group with greater power than existing methods based on SNVs. AVAILABILITY AND IMPLEMENTATION: The implementation of the membership inference algorithms is available at https://github.com/diybu/Haplotype-based-membership-inferences. Oxford University Press 2021-07-12 /pmc/articles/PMC8275351/ /pubmed/34252973 http://dx.doi.org/10.1093/bioinformatics/btab305 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Privacy and Security
Bu, Diyue
Wang, Xiaofeng
Tang, Haixu
Haplotype-based membership inference from summary genomic data
title Haplotype-based membership inference from summary genomic data
title_full Haplotype-based membership inference from summary genomic data
title_fullStr Haplotype-based membership inference from summary genomic data
title_full_unstemmed Haplotype-based membership inference from summary genomic data
title_short Haplotype-based membership inference from summary genomic data
title_sort haplotype-based membership inference from summary genomic data
topic Genome Privacy and Security
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275351/
https://www.ncbi.nlm.nih.gov/pubmed/34252973
http://dx.doi.org/10.1093/bioinformatics/btab305
work_keys_str_mv AT budiyue haplotypebasedmembershipinferencefromsummarygenomicdata
AT wangxiaofeng haplotypebasedmembershipinferencefromsummarygenomicdata
AT tanghaixu haplotypebasedmembershipinferencefromsummarygenomicdata