Cargando…

Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores

Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for...

Descripción completa

Detalles Bibliográficos
Autores principales: Paige, Brooks, Bell, James, Bellet, Aurélien, Gascón, Adrià, Ezer, Daphne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8165474/
https://www.ncbi.nlm.nih.gov/pubmed/33400590
http://dx.doi.org/10.1089/cmb.2020.0445
_version_ 1783701328903536640
author Paige, Brooks
Bell, James
Bellet, Aurélien
Gascón, Adrià
Ezer, Daphne
author_facet Paige, Brooks
Bell, James
Bellet, Aurélien
Gascón, Adrià
Ezer, Daphne
author_sort Paige, Brooks
collection PubMed
description Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.
format Online
Article
Text
id pubmed-8165474
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-81654742021-06-01 Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores Paige, Brooks Bell, James Bellet, Aurélien Gascón, Adrià Ezer, Daphne J Comput Biol Preface Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study. Mary Ann Liebert, Inc., publishers 2021-05-01 2021-05-20 /pmc/articles/PMC8165474/ /pubmed/33400590 http://dx.doi.org/10.1089/cmb.2020.0445 Text en © Brooks Paige, et al., 2021. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by/4.0/This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Preface
Paige, Brooks
Bell, James
Bellet, Aurélien
Gascón, Adrià
Ezer, Daphne
Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title_full Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title_fullStr Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title_full_unstemmed Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title_short Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
title_sort reconstructing genotypes in private genomic databases from genetic risk scores
topic Preface
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8165474/
https://www.ncbi.nlm.nih.gov/pubmed/33400590
http://dx.doi.org/10.1089/cmb.2020.0445
work_keys_str_mv AT paigebrooks reconstructinggenotypesinprivategenomicdatabasesfromgeneticriskscores
AT belljames reconstructinggenotypesinprivategenomicdatabasesfromgeneticriskscores
AT belletaurelien reconstructinggenotypesinprivategenomicdatabasesfromgeneticriskscores
AT gasconadria reconstructinggenotypesinprivategenomicdatabasesfromgeneticriskscores
AT ezerdaphne reconstructinggenotypesinprivategenomicdatabasesfromgeneticriskscores