Cargando…
Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics
The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query respo...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538482/ https://www.ncbi.nlm.nih.gov/pubmed/37217251 http://dx.doi.org/10.1101/gr.277674.123 |
_version_ | 1785113316272635904 |
---|---|
author | Venkatesaramani, Rajagopal Wan, Zhiyu Malin, Bradley A. Vorobeychik, Yevgeniy |
author_facet | Venkatesaramani, Rajagopal Wan, Zhiyu Malin, Bradley A. Vorobeychik, Yevgeniy |
author_sort | Venkatesaramani, Rajagopal |
collection | PubMed |
description | The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio–based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy–utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets. |
format | Online Article Text |
id | pubmed-10538482 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105384822023-09-29 Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics Venkatesaramani, Rajagopal Wan, Zhiyu Malin, Bradley A. Vorobeychik, Yevgeniy Genome Res Methods The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web services called Beacons. However, even such limited releases are susceptible to likelihood ratio–based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the data set and those who are not. We further introduce highly scalable approaches for approximately solving the privacy–utility tradeoff problem when information is in the form of either summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public data sets. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538482/ /pubmed/37217251 http://dx.doi.org/10.1101/gr.277674.123 Text en © 2023 Venkatesaramani et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods Venkatesaramani, Rajagopal Wan, Zhiyu Malin, Bradley A. Vorobeychik, Yevgeniy Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title | Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title_full | Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title_fullStr | Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title_full_unstemmed | Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title_short | Enabling tradeoffs in privacy and utility in genomic data Beacons and summary statistics |
title_sort | enabling tradeoffs in privacy and utility in genomic data beacons and summary statistics |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538482/ https://www.ncbi.nlm.nih.gov/pubmed/37217251 http://dx.doi.org/10.1101/gr.277674.123 |
work_keys_str_mv | AT venkatesaramanirajagopal enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics AT wanzhiyu enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics AT malinbradleya enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics AT vorobeychikyevgeniy enablingtradeoffsinprivacyandutilityingenomicdatabeaconsandsummarystatistics |