Cargando…

Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data

In the area of Big Data, one of the major obstacles for the progress of biomedical research is the existence of data “silos” because legal and ethical constraints often do not allow for sharing sensitive patient data from clinical studies across institutions. While federated machine learning now all...

Descripción completa

Detalles Bibliográficos
Autores principales: Gootjes-Dreesbach, Luise, Sood, Meemansa, Sahay, Akrishta, Hofmann-Apitius, Martin, Fröhlich, Holger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931863/
https://www.ncbi.nlm.nih.gov/pubmed/33693390
http://dx.doi.org/10.3389/fdata.2020.00016
_version_ 1783660369746591744
author Gootjes-Dreesbach, Luise
Sood, Meemansa
Sahay, Akrishta
Hofmann-Apitius, Martin
Fröhlich, Holger
author_facet Gootjes-Dreesbach, Luise
Sood, Meemansa
Sahay, Akrishta
Hofmann-Apitius, Martin
Fröhlich, Holger
author_sort Gootjes-Dreesbach, Luise
collection PubMed
description In the area of Big Data, one of the major obstacles for the progress of biomedical research is the existence of data “silos” because legal and ethical constraints often do not allow for sharing sensitive patient data from clinical studies across institutions. While federated machine learning now allows for building models from scattered data of the same format, there is still the need to investigate, mine, and understand data of separate and very differently designed clinical studies that can only be accessed within each of the data-hosting organizations. Simulation of sufficiently realistic virtual patients based on the data within each individual organization could be a way to fill this gap. In this work, we propose a new machine learning approach [Variational Autoencoder Modular Bayesian Network (VAMBN)] to learn a generative model of longitudinal clinical study data. VAMBN considers typical key aspects of such data, namely limited sample size coupled with comparable many variables of different numerical scales and statistical properties, and many missing values. We show that with VAMBN, we can simulate virtual patients in a sufficiently realistic manner while making theoretical guarantees on data privacy. In addition, VAMBN allows for simulating counterfactual scenarios. Hence, VAMBN could facilitate data sharing as well as design of clinical trials.
format Online
Article
Text
id pubmed-7931863
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79318632021-03-09 Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data Gootjes-Dreesbach, Luise Sood, Meemansa Sahay, Akrishta Hofmann-Apitius, Martin Fröhlich, Holger Front Big Data Big Data In the area of Big Data, one of the major obstacles for the progress of biomedical research is the existence of data “silos” because legal and ethical constraints often do not allow for sharing sensitive patient data from clinical studies across institutions. While federated machine learning now allows for building models from scattered data of the same format, there is still the need to investigate, mine, and understand data of separate and very differently designed clinical studies that can only be accessed within each of the data-hosting organizations. Simulation of sufficiently realistic virtual patients based on the data within each individual organization could be a way to fill this gap. In this work, we propose a new machine learning approach [Variational Autoencoder Modular Bayesian Network (VAMBN)] to learn a generative model of longitudinal clinical study data. VAMBN considers typical key aspects of such data, namely limited sample size coupled with comparable many variables of different numerical scales and statistical properties, and many missing values. We show that with VAMBN, we can simulate virtual patients in a sufficiently realistic manner while making theoretical guarantees on data privacy. In addition, VAMBN allows for simulating counterfactual scenarios. Hence, VAMBN could facilitate data sharing as well as design of clinical trials. Frontiers Media S.A. 2020-05-28 /pmc/articles/PMC7931863/ /pubmed/33693390 http://dx.doi.org/10.3389/fdata.2020.00016 Text en Copyright © 2020 Gootjes-Dreesbach, Sood, Sahay, Hofmann-Apitius and Fröhlich. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Gootjes-Dreesbach, Luise
Sood, Meemansa
Sahay, Akrishta
Hofmann-Apitius, Martin
Fröhlich, Holger
Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title_full Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title_fullStr Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title_full_unstemmed Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title_short Variational Autoencoder Modular Bayesian Networks for Simulation of Heterogeneous Clinical Study Data
title_sort variational autoencoder modular bayesian networks for simulation of heterogeneous clinical study data
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931863/
https://www.ncbi.nlm.nih.gov/pubmed/33693390
http://dx.doi.org/10.3389/fdata.2020.00016
work_keys_str_mv AT gootjesdreesbachluise variationalautoencodermodularbayesiannetworksforsimulationofheterogeneousclinicalstudydata
AT soodmeemansa variationalautoencodermodularbayesiannetworksforsimulationofheterogeneousclinicalstudydata
AT sahayakrishta variationalautoencodermodularbayesiannetworksforsimulationofheterogeneousclinicalstudydata
AT hofmannapitiusmartin variationalautoencodermodularbayesiannetworksforsimulationofheterogeneousclinicalstudydata
AT frohlichholger variationalautoencodermodularbayesiannetworksforsimulationofheterogeneousclinicalstudydata