Cargando…
Assessing transcriptomic reidentification risks using discriminative sequence models
Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression prof...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538488/ https://www.ncbi.nlm.nih.gov/pubmed/37541758 http://dx.doi.org/10.1101/gr.277699.123 |
_version_ | 1785113317465915392 |
---|---|
author | Sadhuka, Shuvom Fridman, Daniel Berger, Bonnie Cho, Hyunghoon |
author_facet | Sadhuka, Shuvom Fridman, Daniel Berger, Bonnie Cho, Hyunghoon |
author_sort | Sadhuka, Shuvom |
collection | PubMed |
description | Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics. |
format | Online Article Text |
id | pubmed-10538488 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105384882023-09-29 Assessing transcriptomic reidentification risks using discriminative sequence models Sadhuka, Shuvom Fridman, Daniel Berger, Bonnie Cho, Hyunghoon Genome Res Methods Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538488/ /pubmed/37541758 http://dx.doi.org/10.1101/gr.277699.123 Text en © 2023 Sadhuka et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods Sadhuka, Shuvom Fridman, Daniel Berger, Bonnie Cho, Hyunghoon Assessing transcriptomic reidentification risks using discriminative sequence models |
title | Assessing transcriptomic reidentification risks using discriminative sequence models |
title_full | Assessing transcriptomic reidentification risks using discriminative sequence models |
title_fullStr | Assessing transcriptomic reidentification risks using discriminative sequence models |
title_full_unstemmed | Assessing transcriptomic reidentification risks using discriminative sequence models |
title_short | Assessing transcriptomic reidentification risks using discriminative sequence models |
title_sort | assessing transcriptomic reidentification risks using discriminative sequence models |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538488/ https://www.ncbi.nlm.nih.gov/pubmed/37541758 http://dx.doi.org/10.1101/gr.277699.123 |
work_keys_str_mv | AT sadhukashuvom assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels AT fridmandaniel assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels AT bergerbonnie assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels AT chohyunghoon assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels |