Cargando…

A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models

BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecul...

Descripción completa

Detalles Bibliográficos
Autores principales: Eicher, Johanna, Bild, Raffael, Spengler, Helmut, Kuhn, Klaus A., Prasser, Fabian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014648/
https://www.ncbi.nlm.nih.gov/pubmed/32046701
http://dx.doi.org/10.1186/s12911-020-1041-3
_version_ 1783496677711151104
author Eicher, Johanna
Bild, Raffael
Spengler, Helmut
Kuhn, Klaus A.
Prasser, Fabian
author_facet Eicher, Johanna
Bild, Raffael
Spengler, Helmut
Kuhn, Klaus A.
Prasser, Fabian
author_sort Eicher, Johanna
collection PubMed
description BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.
format Online
Article
Text
id pubmed-7014648
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70146482020-02-18 A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models Eicher, Johanna Bild, Raffael Spengler, Helmut Kuhn, Klaus A. Prasser, Fabian BMC Med Inform Decis Mak Software BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software. BioMed Central 2020-02-11 /pmc/articles/PMC7014648/ /pubmed/32046701 http://dx.doi.org/10.1186/s12911-020-1041-3 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Eicher, Johanna
Bild, Raffael
Spengler, Helmut
Kuhn, Klaus A.
Prasser, Fabian
A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title_full A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title_fullStr A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title_full_unstemmed A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title_short A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
title_sort comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014648/
https://www.ncbi.nlm.nih.gov/pubmed/32046701
http://dx.doi.org/10.1186/s12911-020-1041-3
work_keys_str_mv AT eicherjohanna acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT bildraffael acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT spenglerhelmut acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT kuhnklausa acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT prasserfabian acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT eicherjohanna comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT bildraffael comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT spenglerhelmut comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT kuhnklausa comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels
AT prasserfabian comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels