Cargando…

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

BACKGROUND: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing. OBJECTIVE: Using machine learning techniques, we...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sung, MinDong, Cha, Dongchul, Park, Yu Rang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663640/ https://www.ncbi.nlm.nih.gov/pubmed/34747711 http://dx.doi.org/10.2196/26914

_version_	1784613684956364800
author	Sung, MinDong Cha, Dongchul Park, Yu Rang
author_facet	Sung, MinDong Cha, Dongchul Park, Yu Rang
author_sort	Sung, MinDong
collection	PubMed
description	BACKGROUND: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing. OBJECTIVE: Using machine learning techniques, we applied differential privacy to medical data with diverse parameters and checked the feasibility of our algorithms with synthetic data as well as the balance between data privacy and utility. METHODS: All data were normalized to a range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the cardinality of the categorical variables, we performed postprocessing via discretization. The algorithm was evaluated using both synthetic and real-world data (from the eICU Collaborative Research Database). We evaluated the difference between the original data and the perturbated data using misclassification rates and the mean squared error for categorical data and continuous data, respectively. Further, we compared the performance of classification models that predict in-hospital mortality using real-world data. RESULTS: The misclassification rate of categorical variables ranged between 0.49 and 0.85 when the value of ε was 0.1, and it converged to 0 as ε increased. When ε was between 10(2) and 10(3), the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of the continuous variables decreased as ε increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as ε increased. In particular, the accuracy of a random forest model developed from the original data was 0.801, and this value ranged from 0.757 to 0.81 when ε was 10(-1) and 10(4), respectively. CONCLUSIONS: We applied local differential privacy to medical domain data, which are diverse and high dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.
format	Online Article Text
id	pubmed-8663640
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-86636402021-12-30 Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation Sung, MinDong Cha, Dongchul Park, Yu Rang JMIR Med Inform Original Paper BACKGROUND: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing. OBJECTIVE: Using machine learning techniques, we applied differential privacy to medical data with diverse parameters and checked the feasibility of our algorithms with synthetic data as well as the balance between data privacy and utility. METHODS: All data were normalized to a range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the cardinality of the categorical variables, we performed postprocessing via discretization. The algorithm was evaluated using both synthetic and real-world data (from the eICU Collaborative Research Database). We evaluated the difference between the original data and the perturbated data using misclassification rates and the mean squared error for categorical data and continuous data, respectively. Further, we compared the performance of classification models that predict in-hospital mortality using real-world data. RESULTS: The misclassification rate of categorical variables ranged between 0.49 and 0.85 when the value of ε was 0.1, and it converged to 0 as ε increased. When ε was between 10(2) and 10(3), the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of the continuous variables decreased as ε increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as ε increased. In particular, the accuracy of a random forest model developed from the original data was 0.801, and this value ranged from 0.757 to 0.81 when ε was 10(-1) and 10(4), respectively. CONCLUSIONS: We applied local differential privacy to medical domain data, which are diverse and high dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations. JMIR Publications 2021-11-08 /pmc/articles/PMC8663640/ /pubmed/34747711 http://dx.doi.org/10.2196/26914 Text en ©MinDong Sung, Dongchul Cha, Yu Rang Park. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 08.11.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Sung, MinDong Cha, Dongchul Park, Yu Rang Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title	Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title_full	Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title_fullStr	Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title_full_unstemmed	Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title_short	Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation
title_sort	local differential privacy in the medical domain to protect sensitive information: algorithm development and real-world validation
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8663640/ https://www.ncbi.nlm.nih.gov/pubmed/34747711 http://dx.doi.org/10.2196/26914
work_keys_str_mv	AT sungmindong localdifferentialprivacyinthemedicaldomaintoprotectsensitiveinformationalgorithmdevelopmentandrealworldvalidation AT chadongchul localdifferentialprivacyinthemedicaldomaintoprotectsensitiveinformationalgorithmdevelopmentandrealworldvalidation AT parkyurang localdifferentialprivacyinthemedicaldomaintoprotectsensitiveinformationalgorithmdevelopmentandrealworldvalidation

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

Ejemplares similares