Cargando…

Differentially private distributed logistic regression using private and public data

BACKGROUND: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Zhanglong, Jiang, Xiaoqian, Wang, Shuang, Xiong, Li, Ohno-Machado, Lucila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4101668/
https://www.ncbi.nlm.nih.gov/pubmed/25079786
http://dx.doi.org/10.1186/1755-8794-7-S1-S14
_version_ 1782480932692819968
author Ji, Zhanglong
Jiang, Xiaoqian
Wang, Shuang
Xiong, Li
Ohno-Machado, Lucila
author_facet Ji, Zhanglong
Jiang, Xiaoqian
Wang, Shuang
Xiong, Li
Ohno-Machado, Lucila
author_sort Ji, Zhanglong
collection PubMed
description BACKGROUND: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. METHODOLOGY: In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. EXPERIMENTS AND RESULTS: We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. CONCLUSION: Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
format Online
Article
Text
id pubmed-4101668
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41016682014-07-18 Differentially private distributed logistic regression using private and public data Ji, Zhanglong Jiang, Xiaoqian Wang, Shuang Xiong, Li Ohno-Machado, Lucila BMC Med Genomics Research BACKGROUND: Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. METHODOLOGY: In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. EXPERIMENTS AND RESULTS: We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. CONCLUSION: Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. BioMed Central 2014-05-08 /pmc/articles/PMC4101668/ /pubmed/25079786 http://dx.doi.org/10.1186/1755-8794-7-S1-S14 Text en Copyright © 2014 Ji et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ji, Zhanglong
Jiang, Xiaoqian
Wang, Shuang
Xiong, Li
Ohno-Machado, Lucila
Differentially private distributed logistic regression using private and public data
title Differentially private distributed logistic regression using private and public data
title_full Differentially private distributed logistic regression using private and public data
title_fullStr Differentially private distributed logistic regression using private and public data
title_full_unstemmed Differentially private distributed logistic regression using private and public data
title_short Differentially private distributed logistic regression using private and public data
title_sort differentially private distributed logistic regression using private and public data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4101668/
https://www.ncbi.nlm.nih.gov/pubmed/25079786
http://dx.doi.org/10.1186/1755-8794-7-S1-S14
work_keys_str_mv AT jizhanglong differentiallyprivatedistributedlogisticregressionusingprivateandpublicdata
AT jiangxiaoqian differentiallyprivatedistributedlogisticregressionusingprivateandpublicdata
AT wangshuang differentiallyprivatedistributedlogisticregressionusingprivateandpublicdata
AT xiongli differentiallyprivatedistributedlogisticregressionusingprivateandpublicdata
AT ohnomachadolucila differentiallyprivatedistributedlogisticregressionusingprivateandpublicdata