Cargando…

Logistic regression over encrypted data from fully homomorphic encryption

BACKGROUND: One of the tasks in the 2017 iDASH secure genome analysis competition was to enable training of logistic regression models over encrypted genomic data. More precisely, given a list of approximately 1500 patient records, each with 18 binary features containing information on specific muta...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Hao, Gilad-Bachrach, Ran, Han, Kyoohyung, Huang, Zhicong, Jalali, Amir, Laine, Kim, Lauter, Kristin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180402/
https://www.ncbi.nlm.nih.gov/pubmed/30309350
http://dx.doi.org/10.1186/s12920-018-0397-z
_version_ 1783362191839199232
author Chen, Hao
Gilad-Bachrach, Ran
Han, Kyoohyung
Huang, Zhicong
Jalali, Amir
Laine, Kim
Lauter, Kristin
author_facet Chen, Hao
Gilad-Bachrach, Ran
Han, Kyoohyung
Huang, Zhicong
Jalali, Amir
Laine, Kim
Lauter, Kristin
author_sort Chen, Hao
collection PubMed
description BACKGROUND: One of the tasks in the 2017 iDASH secure genome analysis competition was to enable training of logistic regression models over encrypted genomic data. More precisely, given a list of approximately 1500 patient records, each with 18 binary features containing information on specific mutations, the idea was for the data holder to encrypt the records using homomorphic encryption, and send them to an untrusted cloud for storage. The cloud could then homomorphically apply a training algorithm on the encrypted data to obtain an encrypted logistic regression model, which can be sent to the data holder for decryption. In this way, the data holder could successfully outsource the training process without revealing either her sensitive data, or the trained model, to the cloud. METHODS: Our solution to this problem has several novelties: we use a multi-bit plaintext space in fully homomorphic encryption together with fixed point number encoding; we combine bootstrapping in fully homomorphic encryption with a scaling operation in fixed point arithmetic; we use a minimax polynomial approximation to the sigmoid function and the 1-bit gradient descent method to reduce the plaintext growth in the training process. RESULTS: Our algorithm for training over encrypted data takes 0.4–3.2 hours per iteration of gradient descent. CONCLUSIONS: We demonstrate the feasibility but high computational cost of training over encrypted data. On the other hand, our method can guarantee the highest level of data privacy in critical applications.
format Online
Article
Text
id pubmed-6180402
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61804022018-10-18 Logistic regression over encrypted data from fully homomorphic encryption Chen, Hao Gilad-Bachrach, Ran Han, Kyoohyung Huang, Zhicong Jalali, Amir Laine, Kim Lauter, Kristin BMC Med Genomics Research BACKGROUND: One of the tasks in the 2017 iDASH secure genome analysis competition was to enable training of logistic regression models over encrypted genomic data. More precisely, given a list of approximately 1500 patient records, each with 18 binary features containing information on specific mutations, the idea was for the data holder to encrypt the records using homomorphic encryption, and send them to an untrusted cloud for storage. The cloud could then homomorphically apply a training algorithm on the encrypted data to obtain an encrypted logistic regression model, which can be sent to the data holder for decryption. In this way, the data holder could successfully outsource the training process without revealing either her sensitive data, or the trained model, to the cloud. METHODS: Our solution to this problem has several novelties: we use a multi-bit plaintext space in fully homomorphic encryption together with fixed point number encoding; we combine bootstrapping in fully homomorphic encryption with a scaling operation in fixed point arithmetic; we use a minimax polynomial approximation to the sigmoid function and the 1-bit gradient descent method to reduce the plaintext growth in the training process. RESULTS: Our algorithm for training over encrypted data takes 0.4–3.2 hours per iteration of gradient descent. CONCLUSIONS: We demonstrate the feasibility but high computational cost of training over encrypted data. On the other hand, our method can guarantee the highest level of data privacy in critical applications. BioMed Central 2018-10-11 /pmc/articles/PMC6180402/ /pubmed/30309350 http://dx.doi.org/10.1186/s12920-018-0397-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chen, Hao
Gilad-Bachrach, Ran
Han, Kyoohyung
Huang, Zhicong
Jalali, Amir
Laine, Kim
Lauter, Kristin
Logistic regression over encrypted data from fully homomorphic encryption
title Logistic regression over encrypted data from fully homomorphic encryption
title_full Logistic regression over encrypted data from fully homomorphic encryption
title_fullStr Logistic regression over encrypted data from fully homomorphic encryption
title_full_unstemmed Logistic regression over encrypted data from fully homomorphic encryption
title_short Logistic regression over encrypted data from fully homomorphic encryption
title_sort logistic regression over encrypted data from fully homomorphic encryption
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180402/
https://www.ncbi.nlm.nih.gov/pubmed/30309350
http://dx.doi.org/10.1186/s12920-018-0397-z
work_keys_str_mv AT chenhao logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT giladbachrachran logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT hankyoohyung logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT huangzhicong logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT jalaliamir logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT lainekim logisticregressionoverencrypteddatafromfullyhomomorphicencryption
AT lauterkristin logisticregressionoverencrypteddatafromfullyhomomorphicencryption