Cargando…

CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints

BACKGROUND: Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction an...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharma, Nand, Millstein, Joshua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926787/
https://www.ncbi.nlm.nih.gov/pubmed/36788490
http://dx.doi.org/10.1186/s12859-023-05159-6
_version_ 1784888350918836224
author Sharma, Nand
Millstein, Joshua
author_facet Sharma, Nand
Millstein, Joshua
author_sort Sharma, Nand
collection PubMed
description BACKGROUND: Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space substantially and can be applied to large-dimensional data. We use what we call ‘generational orderings’ based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, as well as continuous, binary and survival outcomes. RESULTS: We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in just 3.4 min on a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM. CONCLUSIONS: Our generational orderings based search for optimal networks is both an efficient and highly scalable approach for finding optimal Bayesian Networks and can be applied to 1000 s of variables. Using specifiable parameters—correlation, FDR cutoffs, and in-degree—one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option—BIC and Bge—and implementation for survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional data in a variety of fields.
format Online
Article
Text
id pubmed-9926787
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99267872023-02-15 CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints Sharma, Nand Millstein, Joshua BMC Bioinformatics Research BACKGROUND: Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that can feasibly be included. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space substantially and can be applied to large-dimensional data. We use what we call ‘generational orderings’ based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, as well as continuous, binary and survival outcomes. RESULTS: We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in just 3.4 min on a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM. CONCLUSIONS: Our generational orderings based search for optimal networks is both an efficient and highly scalable approach for finding optimal Bayesian Networks and can be applied to 1000 s of variables. Using specifiable parameters—correlation, FDR cutoffs, and in-degree—one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option—BIC and Bge—and implementation for survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional data in a variety of fields. BioMed Central 2023-02-14 /pmc/articles/PMC9926787/ /pubmed/36788490 http://dx.doi.org/10.1186/s12859-023-05159-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Sharma, Nand
Millstein, Joshua
CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title_full CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title_fullStr CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title_full_unstemmed CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title_short CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints
title_sort causnet: generational orderings based search for optimal bayesian networks via dynamic programming with parent set constraints
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926787/
https://www.ncbi.nlm.nih.gov/pubmed/36788490
http://dx.doi.org/10.1186/s12859-023-05159-6
work_keys_str_mv AT sharmanand causnetgenerationalorderingsbasedsearchforoptimalbayesiannetworksviadynamicprogrammingwithparentsetconstraints
AT millsteinjoshua causnetgenerationalorderingsbasedsearchforoptimalbayesiannetworksviadynamicprogrammingwithparentsetconstraints