Cargando…

Efficient privacy-preserving string search and an application in genomics

Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated...

Descripción completa

Detalles Bibliográficos
Autores principales: Shimizu, Kana, Nuida, Koji, Rätsch, Gunnar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892414/
https://www.ncbi.nlm.nih.gov/pubmed/27153731
http://dx.doi.org/10.1093/bioinformatics/btw050
_version_ 1782435382919430144
author Shimizu, Kana
Nuida, Koji
Rätsch, Gunnar
author_facet Shimizu, Kana
Nuida, Koji
Rätsch, Gunnar
author_sort Shimizu, Kana
collection PubMed
description Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows–Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows–Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within [Formula: see text] 4.6 s and [Formula: see text] 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Contacts: shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4892414
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48924142016-06-07 Efficient privacy-preserving string search and an application in genomics Shimizu, Kana Nuida, Koji Rätsch, Gunnar Bioinformatics Hitseq Papers Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows–Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows–Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within [Formula: see text] 4.6 s and [Formula: see text] 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Contacts: shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-01 2016-03-02 /pmc/articles/PMC4892414/ /pubmed/27153731 http://dx.doi.org/10.1093/bioinformatics/btw050 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Hitseq Papers
Shimizu, Kana
Nuida, Koji
Rätsch, Gunnar
Efficient privacy-preserving string search and an application in genomics
title Efficient privacy-preserving string search and an application in genomics
title_full Efficient privacy-preserving string search and an application in genomics
title_fullStr Efficient privacy-preserving string search and an application in genomics
title_full_unstemmed Efficient privacy-preserving string search and an application in genomics
title_short Efficient privacy-preserving string search and an application in genomics
title_sort efficient privacy-preserving string search and an application in genomics
topic Hitseq Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892414/
https://www.ncbi.nlm.nih.gov/pubmed/27153731
http://dx.doi.org/10.1093/bioinformatics/btw050
work_keys_str_mv AT shimizukana efficientprivacypreservingstringsearchandanapplicationingenomics
AT nuidakoji efficientprivacypreservingstringsearchandanapplicationingenomics
AT ratschgunnar efficientprivacypreservingstringsearchandanapplicationingenomics