Cargando…

Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery

Objective To propose a new approach to privacy preserving data selection, which helps the data users access human genomic datasets efficiently without undermining patients’ privacy. Methods Our idea is to let each data owner publish a set of differentially-private pilot data, on which a data user ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Yongan, Wang, Xiaofeng, Jiang, Xiaoqian, Ohno-Machado, Lucila, Tang, Haixu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4433380/
https://www.ncbi.nlm.nih.gov/pubmed/25352565
http://dx.doi.org/10.1136/amiajnl-2014-003043
_version_ 1782371636596441088
author Zhao, Yongan
Wang, Xiaofeng
Jiang, Xiaoqian
Ohno-Machado, Lucila
Tang, Haixu
author_facet Zhao, Yongan
Wang, Xiaofeng
Jiang, Xiaoqian
Ohno-Machado, Lucila
Tang, Haixu
author_sort Zhao, Yongan
collection PubMed
description Objective To propose a new approach to privacy preserving data selection, which helps the data users access human genomic datasets efficiently without undermining patients’ privacy. Methods Our idea is to let each data owner publish a set of differentially-private pilot data, on which a data user can test-run arbitrary association-test algorithms, including those not known to the data owner a priori. We developed a suite of new techniques, including a pilot-data generation approach that leverages the linkage disequilibrium in the human genome to preserve both the utility of the data and the privacy of the patients, and a utility evaluation method that helps the user assess the value of the real data from its pilot version with high confidence. Results We evaluated our approach on real human genomic data using four popular association tests. Our study shows that the proposed approach can help data users make the right choices in most cases. Conclusions Even though the pilot data cannot be directly used for scientific discovery, it provides a useful indication of which datasets are more likely to be useful to data users, who can therefore approach the appropriate data owners to gain access to the data.
format Online
Article
Text
id pubmed-4433380
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44333802016-01-01 Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery Zhao, Yongan Wang, Xiaofeng Jiang, Xiaoqian Ohno-Machado, Lucila Tang, Haixu J Am Med Inform Assoc Research and Applications Objective To propose a new approach to privacy preserving data selection, which helps the data users access human genomic datasets efficiently without undermining patients’ privacy. Methods Our idea is to let each data owner publish a set of differentially-private pilot data, on which a data user can test-run arbitrary association-test algorithms, including those not known to the data owner a priori. We developed a suite of new techniques, including a pilot-data generation approach that leverages the linkage disequilibrium in the human genome to preserve both the utility of the data and the privacy of the patients, and a utility evaluation method that helps the user assess the value of the real data from its pilot version with high confidence. Results We evaluated our approach on real human genomic data using four popular association tests. Our study shows that the proposed approach can help data users make the right choices in most cases. Conclusions Even though the pilot data cannot be directly used for scientific discovery, it provides a useful indication of which datasets are more likely to be useful to data users, who can therefore approach the appropriate data owners to gain access to the data. Oxford University Press 2015-01 2014-10-28 /pmc/articles/PMC4433380/ /pubmed/25352565 http://dx.doi.org/10.1136/amiajnl-2014-003043 Text en © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.comFor numbered affiliations see end of article.
spellingShingle Research and Applications
Zhao, Yongan
Wang, Xiaofeng
Jiang, Xiaoqian
Ohno-Machado, Lucila
Tang, Haixu
Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title_full Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title_fullStr Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title_full_unstemmed Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title_short Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery
title_sort choosing blindly but wisely: differentially private solicitation of dna datasets for disease marker discovery
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4433380/
https://www.ncbi.nlm.nih.gov/pubmed/25352565
http://dx.doi.org/10.1136/amiajnl-2014-003043
work_keys_str_mv AT zhaoyongan choosingblindlybutwiselydifferentiallyprivatesolicitationofdnadatasetsfordiseasemarkerdiscovery
AT wangxiaofeng choosingblindlybutwiselydifferentiallyprivatesolicitationofdnadatasetsfordiseasemarkerdiscovery
AT jiangxiaoqian choosingblindlybutwiselydifferentiallyprivatesolicitationofdnadatasetsfordiseasemarkerdiscovery
AT ohnomachadolucila choosingblindlybutwiselydifferentiallyprivatesolicitationofdnadatasetsfordiseasemarkerdiscovery
AT tanghaixu choosingblindlybutwiselydifferentiallyprivatesolicitationofdnadatasetsfordiseasemarkerdiscovery