Cargando…
Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal bound...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5833935/ https://www.ncbi.nlm.nih.gov/pubmed/29594204 http://dx.doi.org/10.1016/j.ctro.2016.12.004 |
_version_ | 1783303568903634944 |
---|---|
author | Deist, Timo M. Jochems, A. van Soest, Johan Nalbantov, Georgi Oberije, Cary Walsh, Seán Eble, Michael Bulens, Paul Coucke, Philippe Dries, Wim Dekker, Andre Lambin, Philippe |
author_facet | Deist, Timo M. Jochems, A. van Soest, Johan Nalbantov, Georgi Oberije, Cary Walsh, Seán Eble, Michael Bulens, Paul Coucke, Philippe Dries, Wim Dekker, Andre Lambin, Philippe |
author_sort | Deist, Timo M. |
collection | PubMed |
description | Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine. |
format | Online Article Text |
id | pubmed-5833935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-58339352018-03-28 Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT Deist, Timo M. Jochems, A. van Soest, Johan Nalbantov, Georgi Oberije, Cary Walsh, Seán Eble, Michael Bulens, Paul Coucke, Philippe Dries, Wim Dekker, Andre Lambin, Philippe Clin Transl Radiat Oncol Article Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine. Elsevier 2017-05-19 /pmc/articles/PMC5833935/ /pubmed/29594204 http://dx.doi.org/10.1016/j.ctro.2016.12.004 Text en © 2016 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Deist, Timo M. Jochems, A. van Soest, Johan Nalbantov, Georgi Oberije, Cary Walsh, Seán Eble, Michael Bulens, Paul Coucke, Philippe Dries, Wim Dekker, Andre Lambin, Philippe Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title | Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title_full | Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title_fullStr | Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title_full_unstemmed | Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title_short | Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT |
title_sort | infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: eurocat |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5833935/ https://www.ncbi.nlm.nih.gov/pubmed/29594204 http://dx.doi.org/10.1016/j.ctro.2016.12.004 |
work_keys_str_mv | AT deisttimom infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT jochemsa infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT vansoestjohan infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT nalbantovgeorgi infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT oberijecary infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT walshsean infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT eblemichael infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT bulenspaul infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT couckephilippe infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT drieswim infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT dekkerandre infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat AT lambinphilippe infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat |