Cargando…

A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Background  In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy ri...

Descripción completa

Detalles Bibliográficos
Autores principales: Welten, Sascha, Mou, Yongli, Neumann, Laurenz, Jaberansary, Mehrshad, Yediel Ucer, Yeliz, Kirsten, Toralf, Decker, Stefan, Beyan, Oya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9246511/
https://www.ncbi.nlm.nih.gov/pubmed/35038764
http://dx.doi.org/10.1055/s-0041-1740564
Descripción
Sumario:Background  In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest. Objective  We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location. Methods  In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers. Results  We show that our infrastructure enables the training of data models based on distributed data sources. Conclusion  Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.