Cargando…

Inference attacks against differentially private query results from genomic datasets including dependent tuples

MOTIVATION: The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also resul...

Descripción completa

Detalles Bibliográficos
Autores principales: Almadhoun, Nour, Ayday, Erman, Ulusoy, Özgür
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355303/
https://www.ncbi.nlm.nih.gov/pubmed/32657411
http://dx.doi.org/10.1093/bioinformatics/btaa475
_version_ 1783558248618524672
author Almadhoun, Nour
Ayday, Erman
Ulusoy, Özgür
author_facet Almadhoun, Nour
Ayday, Erman
Ulusoy, Özgür
author_sort Almadhoun, Nour
collection PubMed
description MOTIVATION: The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also results in new privacy concerns about personal information of the participants in genomic studies. Differential privacy (DP) is one of the rigorous privacy concepts, which received widespread interest for sharing summary statistics from genomic datasets while protecting the privacy of participants against inference attacks. However, DP has a known drawback as it does not consider the correlation between dataset tuples. Therefore, privacy guarantees of DP-based mechanisms may degrade if the dataset includes dependent tuples, which is a common situation for genomic datasets due to the inherent correlations between genomes of family members. RESULTS: In this article, using two real-life genomic datasets, we show that exploiting the correlation between the dataset participants results in significant information leak from differentially private results of complex queries. We formulate this as an attribute inference attack and show the privacy loss in minor allele frequency (MAF) and chi-square queries. Our results show that using the results of differentially private MAF queries and utilizing the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that the adversary can use the inferred genomic data obtained from the attribute inference attack to infer the membership of a target in another genomic dataset (e.g. associated with a sensitive trait). Using a log-likelihood-ratio test, our results also show that the inference power of the adversary can be significantly high in such an attack even using inferred (and hence partially incorrect) genomes. AVAILABILITY AND IMPLEMENTATION: https://github.com/nourmadhoun/Inference-Attacks-Differential-Privacy
format Online
Article
Text
id pubmed-7355303
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73553032020-07-16 Inference attacks against differentially private query results from genomic datasets including dependent tuples Almadhoun, Nour Ayday, Erman Ulusoy, Özgür Bioinformatics Genome Privacy and Security MOTIVATION: The rapid decrease in the sequencing technology costs leads to a revolution in medical research and clinical care. Today, researchers have access to large genomic datasets to study associations between variants and complex traits. However, availability of such genomic datasets also results in new privacy concerns about personal information of the participants in genomic studies. Differential privacy (DP) is one of the rigorous privacy concepts, which received widespread interest for sharing summary statistics from genomic datasets while protecting the privacy of participants against inference attacks. However, DP has a known drawback as it does not consider the correlation between dataset tuples. Therefore, privacy guarantees of DP-based mechanisms may degrade if the dataset includes dependent tuples, which is a common situation for genomic datasets due to the inherent correlations between genomes of family members. RESULTS: In this article, using two real-life genomic datasets, we show that exploiting the correlation between the dataset participants results in significant information leak from differentially private results of complex queries. We formulate this as an attribute inference attack and show the privacy loss in minor allele frequency (MAF) and chi-square queries. Our results show that using the results of differentially private MAF queries and utilizing the dependency between tuples, an adversary can reveal up to 50% more sensitive information about the genome of a target (compared to original privacy guarantees of standard DP-based mechanisms), while differentially privacy chi-square queries can reveal up to 40% more sensitive information. Furthermore, we show that the adversary can use the inferred genomic data obtained from the attribute inference attack to infer the membership of a target in another genomic dataset (e.g. associated with a sensitive trait). Using a log-likelihood-ratio test, our results also show that the inference power of the adversary can be significantly high in such an attack even using inferred (and hence partially incorrect) genomes. AVAILABILITY AND IMPLEMENTATION: https://github.com/nourmadhoun/Inference-Attacks-Differential-Privacy Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355303/ /pubmed/32657411 http://dx.doi.org/10.1093/bioinformatics/btaa475 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Genome Privacy and Security
Almadhoun, Nour
Ayday, Erman
Ulusoy, Özgür
Inference attacks against differentially private query results from genomic datasets including dependent tuples
title Inference attacks against differentially private query results from genomic datasets including dependent tuples
title_full Inference attacks against differentially private query results from genomic datasets including dependent tuples
title_fullStr Inference attacks against differentially private query results from genomic datasets including dependent tuples
title_full_unstemmed Inference attacks against differentially private query results from genomic datasets including dependent tuples
title_short Inference attacks against differentially private query results from genomic datasets including dependent tuples
title_sort inference attacks against differentially private query results from genomic datasets including dependent tuples
topic Genome Privacy and Security
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355303/
https://www.ncbi.nlm.nih.gov/pubmed/32657411
http://dx.doi.org/10.1093/bioinformatics/btaa475
work_keys_str_mv AT almadhounnour inferenceattacksagainstdifferentiallyprivatequeryresultsfromgenomicdatasetsincludingdependenttuples
AT aydayerman inferenceattacksagainstdifferentiallyprivatequeryresultsfromgenomicdatasetsincludingdependenttuples
AT ulusoyozgur inferenceattacksagainstdifferentiallyprivatequeryresultsfromgenomicdatasetsincludingdependenttuples