Cargando…

Caught you: threats to confidentiality due to the public release of large-scale genetic data sets

BACKGROUND: Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal...

Descripción completa

Detalles Bibliográficos
Autor principal:	Wjst, Matthias
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Debate
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022540/ https://www.ncbi.nlm.nih.gov/pubmed/21190545 http://dx.doi.org/10.1186/1472-6939-11-21

_version_	1782196512698138624
author	Wjst, Matthias
author_facet	Wjst, Matthias
author_sort	Wjst, Matthias
collection	PubMed
description	BACKGROUND: Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal identifiers. DISCUSSION: The assumption of anonymity of genetic data sets, however, is tenuous because genetic data are intrinsically self-identifying. Two types of re-identification are possible: the "Netflix" type and the "profiling" type. The "Netflix" type needs another small genetic data set, usually with less than 100 SNPs but including a personal identifier. This second data set might originate from another clinical examination, a study of leftover samples or forensic testing. When merged to the primary, unidentified set it will re-identify all samples of that individual. Even with no second data set at hand, a "profiling" strategy can be developed to extract as much information as possible from a sample collection. Starting with the identification of ethnic subgroups along with predictions of body characteristics and diseases, the asthma kids case as a real-life example is used to illustrate that approach. SUMMARY: Depending on the degree of supplemental information, there is a good chance that at least a few individuals can be identified from an anonymized data set. Any re-identification, however, may potentially harm study participants because it will release individual genetic disease risks to the public.
format	Text
id	pubmed-3022540
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30225402011-01-19 Caught you: threats to confidentiality due to the public release of large-scale genetic data sets Wjst, Matthias BMC Med Ethics Debate BACKGROUND: Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal identifiers. DISCUSSION: The assumption of anonymity of genetic data sets, however, is tenuous because genetic data are intrinsically self-identifying. Two types of re-identification are possible: the "Netflix" type and the "profiling" type. The "Netflix" type needs another small genetic data set, usually with less than 100 SNPs but including a personal identifier. This second data set might originate from another clinical examination, a study of leftover samples or forensic testing. When merged to the primary, unidentified set it will re-identify all samples of that individual. Even with no second data set at hand, a "profiling" strategy can be developed to extract as much information as possible from a sample collection. Starting with the identification of ethnic subgroups along with predictions of body characteristics and diseases, the asthma kids case as a real-life example is used to illustrate that approach. SUMMARY: Depending on the degree of supplemental information, there is a good chance that at least a few individuals can be identified from an anonymized data set. Any re-identification, however, may potentially harm study participants because it will release individual genetic disease risks to the public. BioMed Central 2010-12-29 /pmc/articles/PMC3022540/ /pubmed/21190545 http://dx.doi.org/10.1186/1472-6939-11-21 Text en Copyright ©2010 Wjst; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Debate Wjst, Matthias Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title	Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title_full	Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title_fullStr	Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title_full_unstemmed	Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title_short	Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
title_sort	caught you: threats to confidentiality due to the public release of large-scale genetic data sets
topic	Debate
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022540/ https://www.ncbi.nlm.nih.gov/pubmed/21190545 http://dx.doi.org/10.1186/1472-6939-11-21
work_keys_str_mv	AT wjstmatthias caughtyouthreatstoconfidentialityduetothepublicreleaseoflargescalegeneticdatasets

Caught you: threats to confidentiality due to the public release of large-scale genetic data sets

Ejemplares similares