Cargando…
Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy
BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of peo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867891/ https://www.ncbi.nlm.nih.gov/pubmed/35209883 http://dx.doi.org/10.1186/s12911-022-01771-3 |
_version_ | 1784656145114202112 |
---|---|
author | Kamphorst, Bart Rooijakkers, Thomas Veugen, Thijs Cellamare, Matteo Knoors, Daan |
author_facet | Kamphorst, Bart Rooijakkers, Thomas Veugen, Thijs Cellamare, Matteo Knoors, Daan |
author_sort | Kamphorst, Bart |
collection | PubMed |
description | BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. METHODS: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. RESULTS: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. CONCLUSIONS: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy. |
format | Online Article Text |
id | pubmed-8867891 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-88678912022-02-25 Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy Kamphorst, Bart Rooijakkers, Thomas Veugen, Thijs Cellamare, Matteo Knoors, Daan BMC Med Inform Decis Mak Research BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. METHODS: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. RESULTS: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. CONCLUSIONS: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy. BioMed Central 2022-02-24 /pmc/articles/PMC8867891/ /pubmed/35209883 http://dx.doi.org/10.1186/s12911-022-01771-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Kamphorst, Bart Rooijakkers, Thomas Veugen, Thijs Cellamare, Matteo Knoors, Daan Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title | Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title_full | Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title_fullStr | Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title_full_unstemmed | Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title_short | Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy |
title_sort | accurate training of the cox proportional hazards model on vertically-partitioned data while preserving privacy |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867891/ https://www.ncbi.nlm.nih.gov/pubmed/35209883 http://dx.doi.org/10.1186/s12911-022-01771-3 |
work_keys_str_mv | AT kamphorstbart accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy AT rooijakkersthomas accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy AT veugenthijs accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy AT cellamarematteo accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy AT knoorsdaan accuratetrainingofthecoxproportionalhazardsmodelonverticallypartitioneddatawhilepreservingprivacy |