Cargando…

Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy

BACKGROUND: Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as t...

Descripción completa

Detalles Bibliográficos
Autor principal: Bacanu, Silviu-Alin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5491025/
https://www.ncbi.nlm.nih.gov/pubmed/28662067
http://dx.doi.org/10.1371/journal.pone.0179504
_version_ 1783247065322618880
author Bacanu, Silviu-Alin
author_facet Bacanu, Silviu-Alin
author_sort Bacanu, Silviu-Alin
collection PubMed
description BACKGROUND: Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as the detection of a subject belonging to a certain cohort (SBCC). Subsequently, Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. However, it is not clear if the same holds for more ethnically diverse cohorts. Later, Masca et al. propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency. They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification. APPROACH: To investigate the possibility of SBCC detection in multi-ethnic cohorts, we generalize the Masca et al. approach by theoretically deriving the correlation between a subject genotype and the cohort reference allele frequencies (RAFs) for stratified cohorts. Based on the derived formula, we theoretically show that, due to background stratification noise, SBCC detection is unlikely even for mildly stratified cohorts of size greater than around a thousand subjects. Thus, for the vast majority of contemporary cohorts, the fear of compromising privacy via SBCC detection is unfounded.
format Online
Article
Text
id pubmed-5491025
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54910252017-07-18 Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy Bacanu, Silviu-Alin PLoS One Research Article BACKGROUND: Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as the detection of a subject belonging to a certain cohort (SBCC). Subsequently, Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. However, it is not clear if the same holds for more ethnically diverse cohorts. Later, Masca et al. propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency. They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification. APPROACH: To investigate the possibility of SBCC detection in multi-ethnic cohorts, we generalize the Masca et al. approach by theoretically deriving the correlation between a subject genotype and the cohort reference allele frequencies (RAFs) for stratified cohorts. Based on the derived formula, we theoretically show that, due to background stratification noise, SBCC detection is unlikely even for mildly stratified cohorts of size greater than around a thousand subjects. Thus, for the vast majority of contemporary cohorts, the fear of compromising privacy via SBCC detection is unfounded. Public Library of Science 2017-06-29 /pmc/articles/PMC5491025/ /pubmed/28662067 http://dx.doi.org/10.1371/journal.pone.0179504 Text en © 2017 Silviu-Alin Bacanu http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bacanu, Silviu-Alin
Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title_full Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title_fullStr Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title_full_unstemmed Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title_short Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
title_sort sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5491025/
https://www.ncbi.nlm.nih.gov/pubmed/28662067
http://dx.doi.org/10.1371/journal.pone.0179504
work_keys_str_mv AT bacanusilviualin sharingextendedsummarydatafromcontemporarygeneticsstudiesisunlikelytothreatensubjectprivacy