Cargando…

Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query respo...

Descripción completa

Detalles Bibliográficos
Autores principales: Venkatesaramani, Rajagopal, Wan, Zhiyu, Malin, Bradley A., Vorobeychik, Yevgeniy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538482/
https://www.ncbi.nlm.nih.gov/pubmed/37217251
http://dx.doi.org/10.1101/gr.277674.123
_version_ 1785113316272635904
author Venkatesaramani, Rajagopal
Wan, Zhiyu
Malin, Bradley A.
Vorobeychik, Yevgeniy
author_facet Venkatesaramani, Rajagopal
Wan, Zhiyu
Malin, Bradley A.
Vorobeychik, Yevgeniy
author_sort Venkatesaramani, Rajagopal
collection PubMed
description The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio–based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy–utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets.
format Online
Article
Text
id pubmed-10538482
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-105384822023-09-29 Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics Venkatesaramani, Rajagopal Wan, Zhiyu Malin, Bradley A. Vorobeychik, Yevgeniy Genome Res Methods The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio–based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy–utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538482/ /pubmed/37217251 http://dx.doi.org/10.1101/gr.277674.123 Text en © 2023 Venkatesaramani et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Venkatesaramani, Rajagopal
Wan, Zhiyu
Malin, Bradley A.
Vorobeychik, Yevgeniy
Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title_full Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title_fullStr Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title_full_unstemmed Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title_short Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
title_sort enabling tradeoffs in privacy and utility in genomic data beacons and summary statistics
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538482/
https://www.ncbi.nlm.nih.gov/pubmed/37217251
http://dx.doi.org/10.1101/gr.277674.123
work_keys_str_mv AT venkatesaramanirajagopal enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics
AT wanzhiyu enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics
AT malinbradleya enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics
AT vorobeychikyevgeniy enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics