Cargando…

Assessing transcriptomic reidentification risks using discriminative sequence models

Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression prof...

Descripción completa

Detalles Bibliográficos
Autores principales: Sadhuka, Shuvom, Fridman, Daniel, Berger, Bonnie, Cho, Hyunghoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538488/
https://www.ncbi.nlm.nih.gov/pubmed/37541758
http://dx.doi.org/10.1101/gr.277699.123
_version_ 1785113317465915392
author Sadhuka, Shuvom
Fridman, Daniel
Berger, Bonnie
Cho, Hyunghoon
author_facet Sadhuka, Shuvom
Fridman, Daniel
Berger, Bonnie
Cho, Hyunghoon
author_sort Sadhuka, Shuvom
collection PubMed
description Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics.
format Online
Article
Text
id pubmed-10538488
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-105384882023-09-29 Assessing transcriptomic reidentification risks using discriminative sequence models Sadhuka, Shuvom Fridman, Daniel Berger, Bonnie Cho, Hyunghoon Genome Res Methods Gene expression data provide molecular insights into the functional impact of genetic variation, for example, through expression quantitative trait loci (eQTLs). With an improving understanding of the association between genotypes and gene expression comes a greater concern that gene expression profiles could be matched to genotype profiles of the same individuals in another data set, known as a linking attack. Prior works show such a risk could analyze only a fraction of eQTLs that is independent owing to restrictive model assumptions, leaving the full extent of this risk incompletely understood. To address this challenge, we introduce the discriminative sequence model (DSM), a novel probabilistic framework for predicting a sequence of genotypes based on gene expression data. By modeling the joint distribution over all known eQTLs in a genomic region, DSM improves the power of linking attacks with necessary calibration for linkage disequilibrium and redundant predictive signals. We show greater linking accuracy of DSM compared with existing approaches across a range of attack scenarios and data sets including up to 22,288 individuals, suggesting that DSM helps uncover a substantial additional risk overlooked by previous studies. Our work provides a unified framework for assessing the privacy risks of sharing diverse omics data sets beyond transcriptomics. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538488/ /pubmed/37541758 http://dx.doi.org/10.1101/gr.277699.123 Text en © 2023 Sadhuka et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Sadhuka, Shuvom
Fridman, Daniel
Berger, Bonnie
Cho, Hyunghoon
Assessing transcriptomic reidentification risks using discriminative sequence models
title Assessing transcriptomic reidentification risks using discriminative sequence models
title_full Assessing transcriptomic reidentification risks using discriminative sequence models
title_fullStr Assessing transcriptomic reidentification risks using discriminative sequence models
title_full_unstemmed Assessing transcriptomic reidentification risks using discriminative sequence models
title_short Assessing transcriptomic reidentification risks using discriminative sequence models
title_sort assessing transcriptomic reidentification risks using discriminative sequence models
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538488/
https://www.ncbi.nlm.nih.gov/pubmed/37541758
http://dx.doi.org/10.1101/gr.277699.123
work_keys_str_mv AT sadhukashuvom assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels
AT fridmandaniel assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels
AT bergerbonnie assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels
AT chohyunghoon assessingtranscriptomicreidentificationrisksusingdiscriminativesequencemodels