Cargando…

Distributed Bayesian networks reconstruction on the whole genome scale

BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, lear...

Descripción completa

Detalles Bibliográficos
Autores principales: Frolova, Alina, Wilczyński, Bartek
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197044/
https://www.ncbi.nlm.nih.gov/pubmed/30364537
http://dx.doi.org/10.7717/peerj.5692
_version_ 1783364679210369024
author Frolova, Alina
Wilczyński, Bartek
author_facet Frolova, Alina
Wilczyński, Bartek
author_sort Frolova, Alina
collection PubMed
description BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. RESULTS: In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. CONCLUSIONS: We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets.
format Online
Article
Text
id pubmed-6197044
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-61970442018-10-24 Distributed Bayesian networks reconstruction on the whole genome scale Frolova, Alina Wilczyński, Bartek PeerJ Bioinformatics BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. RESULTS: In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. CONCLUSIONS: We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets. PeerJ Inc. 2018-10-19 /pmc/articles/PMC6197044/ /pubmed/30364537 http://dx.doi.org/10.7717/peerj.5692 Text en ©2018 Frolova and Wilczyński http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Frolova, Alina
Wilczyński, Bartek
Distributed Bayesian networks reconstruction on the whole genome scale
title Distributed Bayesian networks reconstruction on the whole genome scale
title_full Distributed Bayesian networks reconstruction on the whole genome scale
title_fullStr Distributed Bayesian networks reconstruction on the whole genome scale
title_full_unstemmed Distributed Bayesian networks reconstruction on the whole genome scale
title_short Distributed Bayesian networks reconstruction on the whole genome scale
title_sort distributed bayesian networks reconstruction on the whole genome scale
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197044/
https://www.ncbi.nlm.nih.gov/pubmed/30364537
http://dx.doi.org/10.7717/peerj.5692
work_keys_str_mv AT frolovaalina distributedbayesiannetworksreconstructiononthewholegenomescale
AT wilczynskibartek distributedbayesiannetworksreconstructiononthewholegenomescale