Cargando…
Distributed Bayesian networks reconstruction on the whole genome scale
BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, lear...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197044/ https://www.ncbi.nlm.nih.gov/pubmed/30364537 http://dx.doi.org/10.7717/peerj.5692 |
_version_ | 1783364679210369024 |
---|---|
author | Frolova, Alina Wilczyński, Bartek |
author_facet | Frolova, Alina Wilczyński, Bartek |
author_sort | Frolova, Alina |
collection | PubMed |
description | BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. RESULTS: In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. CONCLUSIONS: We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets. |
format | Online Article Text |
id | pubmed-6197044 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61970442018-10-24 Distributed Bayesian networks reconstruction on the whole genome scale Frolova, Alina Wilczyński, Bartek PeerJ Bioinformatics BACKGROUND: Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. RESULTS: In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. CONCLUSIONS: We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets. PeerJ Inc. 2018-10-19 /pmc/articles/PMC6197044/ /pubmed/30364537 http://dx.doi.org/10.7717/peerj.5692 Text en ©2018 Frolova and Wilczyński http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Frolova, Alina Wilczyński, Bartek Distributed Bayesian networks reconstruction on the whole genome scale |
title | Distributed Bayesian networks reconstruction on the whole genome scale |
title_full | Distributed Bayesian networks reconstruction on the whole genome scale |
title_fullStr | Distributed Bayesian networks reconstruction on the whole genome scale |
title_full_unstemmed | Distributed Bayesian networks reconstruction on the whole genome scale |
title_short | Distributed Bayesian networks reconstruction on the whole genome scale |
title_sort | distributed bayesian networks reconstruction on the whole genome scale |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197044/ https://www.ncbi.nlm.nih.gov/pubmed/30364537 http://dx.doi.org/10.7717/peerj.5692 |
work_keys_str_mv | AT frolovaalina distributedbayesiannetworksreconstructiononthewholegenomescale AT wilczynskibartek distributedbayesiannetworksreconstructiononthewholegenomescale |