Cargando…

Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy

BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of peo...

Descripción completa

Detalles Bibliográficos
Autores principales: Kamphorst, Bart, Rooijakkers, Thomas, Veugen, Thijs, Cellamare, Matteo, Knoors, Daan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867891/
https://www.ncbi.nlm.nih.gov/pubmed/35209883
http://dx.doi.org/10.1186/s12911-022-01771-3
_version_ 1784656145114202112
author Kamphorst, Bart
Rooijakkers, Thomas
Veugen, Thijs
Cellamare, Matteo
Knoors, Daan
author_facet Kamphorst, Bart
Rooijakkers, Thomas
Veugen, Thijs
Cellamare, Matteo
Knoors, Daan
author_sort Kamphorst, Bart
collection PubMed
description BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. METHODS: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. RESULTS: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. CONCLUSIONS: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.
format Online
Article
Text
id pubmed-8867891
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88678912022-02-25 Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy Kamphorst, Bart Rooijakkers, Thomas Veugen, Thijs Cellamare, Matteo Knoors, Daan BMC Med Inform Decis Mak Research BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. METHODS: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. RESULTS: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. CONCLUSIONS: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy. BioMed Central 2022-02-24 /pmc/articles/PMC8867891/ /pubmed/35209883 http://dx.doi.org/10.1186/s12911-022-01771-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kamphorst, Bart
Rooijakkers, Thomas
Veugen, Thijs
Cellamare, Matteo
Knoors, Daan
Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title_full Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title_fullStr Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title_full_unstemmed Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title_short Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
title_sort accurate training of the cox proportional hazards model on vertically-partitioned data while preserving privacy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867891/
https://www.ncbi.nlm.nih.gov/pubmed/35209883
http://dx.doi.org/10.1186/s12911-022-01771-3
work_keys_str_mv AT kamphorstbart accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy
AT rooijakkersthomas accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy
AT veugenthijs accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy
AT cellamarematteo accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy
AT knoorsdaan accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy