Cargando…

Privacy-preserving logistic regression training

BACKGROUND: Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bonte, Charlotte, Vercauteren, Frederik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180357/ https://www.ncbi.nlm.nih.gov/pubmed/30309364 http://dx.doi.org/10.1186/s12920-018-0398-y

_version_	1783362183347830784
author	Bonte, Charlotte Vercauteren, Frederik
author_facet	Bonte, Charlotte Vercauteren, Frederik
author_sort	Bonte, Charlotte
collection	PubMed
description	BACKGROUND: Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. METHODS: In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. RESULTS: We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. CONCLUSIONS: This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.
format	Online Article Text
id	pubmed-6180357
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-61803572018-10-18 Privacy-preserving logistic regression training Bonte, Charlotte Vercauteren, Frederik BMC Med Genomics Research BACKGROUND: Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. METHODS: In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. RESULTS: We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. CONCLUSIONS: This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training. BioMed Central 2018-10-11 /pmc/articles/PMC6180357/ /pubmed/30309364 http://dx.doi.org/10.1186/s12920-018-0398-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Bonte, Charlotte Vercauteren, Frederik Privacy-preserving logistic regression training
title	Privacy-preserving logistic regression training
title_full	Privacy-preserving logistic regression training
title_fullStr	Privacy-preserving logistic regression training
title_full_unstemmed	Privacy-preserving logistic regression training
title_short	Privacy-preserving logistic regression training
title_sort	privacy-preserving logistic regression training
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180357/ https://www.ncbi.nlm.nih.gov/pubmed/30309364 http://dx.doi.org/10.1186/s12920-018-0398-y
work_keys_str_mv	AT bontecharlotte privacypreservinglogisticregressiontraining AT vercauterenfrederik privacypreservinglogisticregressiontraining

Privacy-preserving logistic regression training

Ejemplares similares