Cargando…

Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT

Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal bound...

Descripción completa

Detalles Bibliográficos
Autores principales: Deist, Timo M., Jochems, A., van Soest, Johan, Nalbantov, Georgi, Oberije, Cary, Walsh, Seán, Eble, Michael, Bulens, Paul, Coucke, Philippe, Dries, Wim, Dekker, Andre, Lambin, Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5833935/
https://www.ncbi.nlm.nih.gov/pubmed/29594204
http://dx.doi.org/10.1016/j.ctro.2016.12.004
_version_ 1783303568903634944
author Deist, Timo M.
Jochems, A.
van Soest, Johan
Nalbantov, Georgi
Oberije, Cary
Walsh, Seán
Eble, Michael
Bulens, Paul
Coucke, Philippe
Dries, Wim
Dekker, Andre
Lambin, Philippe
author_facet Deist, Timo M.
Jochems, A.
van Soest, Johan
Nalbantov, Georgi
Oberije, Cary
Walsh, Seán
Eble, Michael
Bulens, Paul
Coucke, Philippe
Dries, Wim
Dekker, Andre
Lambin, Philippe
author_sort Deist, Timo M.
collection PubMed
description Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine.
format Online
Article
Text
id pubmed-5833935
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-58339352018-03-28 Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT Deist, Timo M. Jochems, A. van Soest, Johan Nalbantov, Georgi Oberije, Cary Walsh, Seán Eble, Michael Bulens, Paul Coucke, Philippe Dries, Wim Dekker, Andre Lambin, Philippe Clin Transl Radiat Oncol Article Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future ‘big data’ infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine. Elsevier 2017-05-19 /pmc/articles/PMC5833935/ /pubmed/29594204 http://dx.doi.org/10.1016/j.ctro.2016.12.004 Text en © 2016 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Deist, Timo M.
Jochems, A.
van Soest, Johan
Nalbantov, Georgi
Oberije, Cary
Walsh, Seán
Eble, Michael
Bulens, Paul
Coucke, Philippe
Dries, Wim
Dekker, Andre
Lambin, Philippe
Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title_full Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title_fullStr Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title_full_unstemmed Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title_short Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT
title_sort infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: eurocat
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5833935/
https://www.ncbi.nlm.nih.gov/pubmed/29594204
http://dx.doi.org/10.1016/j.ctro.2016.12.004
work_keys_str_mv AT deisttimom infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT jochemsa infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT vansoestjohan infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT nalbantovgeorgi infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT oberijecary infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT walshsean infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT eblemichael infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT bulenspaul infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT couckephilippe infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT drieswim infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT dekkerandre infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat
AT lambinphilippe infrastructureanddistributedlearningmethodologyforprivacypreservingmulticentricrapidlearninghealthcareeurocat