Cargando…

Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials

Goal: To develop a computationally efficient and unbiased synthetic data generator for large-scale in silico clinical trials (CTs). Methods: We propose the BGMM-OCE, an extension of the conventional BGMM (Bayesian Gaussian Mixture Models) algorithm to provide unbiased estimations regarding the optim...

Descripción completa

Detalles Bibliográficos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: IEEE 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9970043/
https://www.ncbi.nlm.nih.gov/pubmed/36860496
http://dx.doi.org/10.1109/OJEMB.2022.3181796
_version_ 1784897838706065408
collection PubMed
description Goal: To develop a computationally efficient and unbiased synthetic data generator for large-scale in silico clinical trials (CTs). Methods: We propose the BGMM-OCE, an extension of the conventional BGMM (Bayesian Gaussian Mixture Models) algorithm to provide unbiased estimations regarding the optimal number of Gaussian components and yield high-quality, large-scale synthetic data at reduced computational complexity. Spectral clustering with efficient eigenvalue decomposition is applied to estimate the hyperparameters of the generator. A case study is conducted to compare the performance of BGMM-OCE against four straightforward synthetic data generators for in silico CTs in hypertrophic cardiomyopathy (HCM). Results: The BGMM-OCE generated 30000 virtual patient profiles having the lowest coefficient-of-variation (0.046), inter- and intra-correlation differences (0.017, and 0.016, respectively) with the real ones in reduced execution time. Conclusions: BGMM-OCE overcomes the lack of population size in HCM which obscures the development of targeted therapies and robust risk stratification models.
format Online
Article
Text
id pubmed-9970043
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher IEEE
record_format MEDLINE/PubMed
spelling pubmed-99700432023-02-28 Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials IEEE Open J Eng Med Biol Article Goal: To develop a computationally efficient and unbiased synthetic data generator for large-scale in silico clinical trials (CTs). Methods: We propose the BGMM-OCE, an extension of the conventional BGMM (Bayesian Gaussian Mixture Models) algorithm to provide unbiased estimations regarding the optimal number of Gaussian components and yield high-quality, large-scale synthetic data at reduced computational complexity. Spectral clustering with efficient eigenvalue decomposition is applied to estimate the hyperparameters of the generator. A case study is conducted to compare the performance of BGMM-OCE against four straightforward synthetic data generators for in silico CTs in hypertrophic cardiomyopathy (HCM). Results: The BGMM-OCE generated 30000 virtual patient profiles having the lowest coefficient-of-variation (0.046), inter- and intra-correlation differences (0.017, and 0.016, respectively) with the real ones in reduced execution time. Conclusions: BGMM-OCE overcomes the lack of population size in HCM which obscures the development of targeted therapies and robust risk stratification models. IEEE 2022-06-10 /pmc/articles/PMC9970043/ /pubmed/36860496 http://dx.doi.org/10.1109/OJEMB.2022.3181796 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title_full Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title_fullStr Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title_full_unstemmed Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title_short Bayesian Inference-Based Gaussian Mixture Models With Optimal Components Estimation Towards Large-Scale Synthetic Data Generation for In Silico Clinical Trials
title_sort bayesian inference-based gaussian mixture models with optimal components estimation towards large-scale synthetic data generation for in silico clinical trials
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9970043/
https://www.ncbi.nlm.nih.gov/pubmed/36860496
http://dx.doi.org/10.1109/OJEMB.2022.3181796
work_keys_str_mv AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials
AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials
AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials
AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials
AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials
AT bayesianinferencebasedgaussianmixturemodelswithoptimalcomponentsestimationtowardslargescalesyntheticdatagenerationforinsilicoclinicaltrials