Cargando…

Efficient privacy-preserving whole-genome variant queries

MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contain...

Descripción completa

Detalles Bibliográficos
Autores principales: Akgün, Mete, Pfeifer, Nico, Kohlbacher, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004657/
https://www.ncbi.nlm.nih.gov/pubmed/35150254
http://dx.doi.org/10.1093/bioinformatics/btac070
_version_ 1784686310534938624
author Akgün, Mete
Pfeifer, Nico
Kohlbacher, Oliver
author_facet Akgün, Mete
Pfeifer, Nico
Kohlbacher, Oliver
author_sort Akgün, Mete
collection PubMed
description MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data. RESULTS: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9004657
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90046572022-04-13 Efficient privacy-preserving whole-genome variant queries Akgün, Mete Pfeifer, Nico Kohlbacher, Oliver Bioinformatics Original Papers MOTIVATION: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease–gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data. RESULTS: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-12 /pmc/articles/PMC9004657/ /pubmed/35150254 http://dx.doi.org/10.1093/bioinformatics/btac070 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Akgün, Mete
Pfeifer, Nico
Kohlbacher, Oliver
Efficient privacy-preserving whole-genome variant queries
title Efficient privacy-preserving whole-genome variant queries
title_full Efficient privacy-preserving whole-genome variant queries
title_fullStr Efficient privacy-preserving whole-genome variant queries
title_full_unstemmed Efficient privacy-preserving whole-genome variant queries
title_short Efficient privacy-preserving whole-genome variant queries
title_sort efficient privacy-preserving whole-genome variant queries
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004657/
https://www.ncbi.nlm.nih.gov/pubmed/35150254
http://dx.doi.org/10.1093/bioinformatics/btac070
work_keys_str_mv AT akgunmete efficientprivacypreservingwholegenomevariantqueries
AT pfeifernico efficientprivacypreservingwholegenomevariantqueries
AT kohlbacheroliver efficientprivacypreservingwholegenomevariantqueries