Cargando…
Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation
BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approa...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238077/ https://www.ncbi.nlm.nih.gov/pubmed/32369025 http://dx.doi.org/10.2196/14064 |
_version_ | 1783536460476973056 |
---|---|
author | Festag, Sven Spreckelsen, Cord |
author_facet | Festag, Sven Spreckelsen, Cord |
author_sort | Festag, Sven |
collection | PubMed |
description | BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS: The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS: These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS: Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection. |
format | Online Article Text |
id | pubmed-7238077 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-72380772020-06-01 Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation Festag, Sven Spreckelsen, Cord JMIR Form Res Original Paper BACKGROUND: Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE: In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS: The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS: These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS: Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection. JMIR Publications 2020-05-05 /pmc/articles/PMC7238077/ /pubmed/32369025 http://dx.doi.org/10.2196/14064 Text en ©Sven Festag, Cord Spreckelsen. Originally published in JMIR Formative Research (http://formative.jmir.org), 05.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Festag, Sven Spreckelsen, Cord Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title | Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title_full | Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title_fullStr | Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title_full_unstemmed | Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title_short | Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation |
title_sort | privacy-preserving deep learning for the detection of protected health information in real-world data: comparative evaluation |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238077/ https://www.ncbi.nlm.nih.gov/pubmed/32369025 http://dx.doi.org/10.2196/14064 |
work_keys_str_mv | AT festagsven privacypreservingdeeplearningforthedetectionofprotectedhealthinformationinrealworlddatacomparativeevaluation AT spreckelsencord privacypreservingdeeplearningforthedetectionofprotectedhealthinformationinrealworlddatacomparativeevaluation |