Cargando…

Privacy Risks from Genomic Data-Sharing Beacons

The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g...

Descripción completa

Detalles Bibliográficos
Autores principales: Shringarpure, Suyash S., Bustamante, Carlos D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667107/
https://www.ncbi.nlm.nih.gov/pubmed/26522470
http://dx.doi.org/10.1016/j.ajhg.2015.09.010
_version_ 1782403783341375488
author Shringarpure, Suyash S.
Bustamante, Carlos D.
author_facet Shringarpure, Suyash S.
Bustamante, Carlos D.
author_sort Shringarpure, Suyash S.
collection PubMed
description The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
format Online
Article
Text
id pubmed-4667107
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-46671072016-05-05 Privacy Risks from Genomic Data-Sharing Beacons Shringarpure, Suyash S. Bustamante, Carlos D. Am J Hum Genet Article The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes. Elsevier 2015-11-05 2015-10-29 /pmc/articles/PMC4667107/ /pubmed/26522470 http://dx.doi.org/10.1016/j.ajhg.2015.09.010 Text en © 2015 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Shringarpure, Suyash S.
Bustamante, Carlos D.
Privacy Risks from Genomic Data-Sharing Beacons
title Privacy Risks from Genomic Data-Sharing Beacons
title_full Privacy Risks from Genomic Data-Sharing Beacons
title_fullStr Privacy Risks from Genomic Data-Sharing Beacons
title_full_unstemmed Privacy Risks from Genomic Data-Sharing Beacons
title_short Privacy Risks from Genomic Data-Sharing Beacons
title_sort privacy risks from genomic data-sharing beacons
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667107/
https://www.ncbi.nlm.nih.gov/pubmed/26522470
http://dx.doi.org/10.1016/j.ajhg.2015.09.010
work_keys_str_mv AT shringarpuresuyashs privacyrisksfromgenomicdatasharingbeacons
AT bustamantecarlosd privacyrisksfromgenomicdatasharingbeacons