Cargando…
A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models
BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecul...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014648/ https://www.ncbi.nlm.nih.gov/pubmed/32046701 http://dx.doi.org/10.1186/s12911-020-1041-3 |
_version_ | 1783496677711151104 |
---|---|
author | Eicher, Johanna Bild, Raffael Spengler, Helmut Kuhn, Klaus A. Prasser, Fabian |
author_facet | Eicher, Johanna Bild, Raffael Spengler, Helmut Kuhn, Klaus A. Prasser, Fabian |
author_sort | Eicher, Johanna |
collection | PubMed |
description | BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software. |
format | Online Article Text |
id | pubmed-7014648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70146482020-02-18 A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models Eicher, Johanna Bild, Raffael Spengler, Helmut Kuhn, Klaus A. Prasser, Fabian BMC Med Inform Decis Mak Software BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software. BioMed Central 2020-02-11 /pmc/articles/PMC7014648/ /pubmed/32046701 http://dx.doi.org/10.1186/s12911-020-1041-3 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Eicher, Johanna Bild, Raffael Spengler, Helmut Kuhn, Klaus A. Prasser, Fabian A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title | A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title_full | A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title_fullStr | A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title_full_unstemmed | A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title_short | A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
title_sort | comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7014648/ https://www.ncbi.nlm.nih.gov/pubmed/32046701 http://dx.doi.org/10.1186/s12911-020-1041-3 |
work_keys_str_mv | AT eicherjohanna acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT bildraffael acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT spenglerhelmut acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT kuhnklausa acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT prasserfabian acomprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT eicherjohanna comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT bildraffael comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT spenglerhelmut comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT kuhnklausa comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels AT prasserfabian comprehensivetoolforcreatingandevaluatingprivacypreservingbiomedicalpredictionmodels |