Cargando…

Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation

BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approa...

Descripción completa

Detalles Bibliográficos
Autores principales: Festag, Sven, Spreckelsen, Cord
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238077/
https://www.ncbi.nlm.nih.gov/pubmed/32369025
http://dx.doi.org/10.2196/14064
_version_ 1783536460476973056
author Festag, Sven
Spreckelsen, Cord
author_facet Festag, Sven
Spreckelsen, Cord
author_sort Festag, Sven
collection PubMed
description BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS: The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS: These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS: Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.
format Online
Article
Text
id pubmed-7238077
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72380772020-06-01 Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation Festag, Sven Spreckelsen, Cord JMIR Form Res Original Paper BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS: The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS: These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS: Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection. JMIR Publications 2020-05-05 /pmc/articles/PMC7238077/ /pubmed/32369025 http://dx.doi.org/10.2196/14064 Text en ©Sven Festag, Cord Spreckelsen. Originally published in JMIR Formative Research (http://formative.jmir.org), 05.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Festag, Sven
Spreckelsen, Cord
Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title_full Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title_fullStr Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title_full_unstemmed Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title_short Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
title_sort privacy-preserving deep learning for the detection of protected health information in real-world data: comparative evaluation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238077/
https://www.ncbi.nlm.nih.gov/pubmed/32369025
http://dx.doi.org/10.2196/14064
work_keys_str_mv AT festagsven privacypreservingdeeplearningforthedetectionofprotectedhealthinformationinrealworlddatacomparativeevaluation
AT spreckelsencord privacypreservingdeeplearningforthedetectionofprotectedhealthinformationinrealworlddatacomparativeevaluation