Cargando…

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist...

Descripción completa

Detalles Bibliográficos
Autores principales: Graafland, Catharina E., Gutiérrez, José M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636198/
https://www.ncbi.nlm.nih.gov/pubmed/36333425
http://dx.doi.org/10.1038/s41598-022-21957-z
_version_ 1784824886534864896
author Graafland, Catharina E.
Gutiérrez, José M.
author_facet Graafland, Catharina E.
Gutiérrez, José M.
author_sort Graafland, Catharina E.
collection PubMed
description Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)—a subclass of PNMs—the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance.
format Online
Article
Text
id pubmed-9636198
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96361982022-11-06 Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks Graafland, Catharina E. Gutiérrez, José M. Sci Rep Article Reconstruction of Gene Regulatory Networks (GRNs) of gene expression data with Probabilistic Network Models (PNMs) is an open problem. Gene expression datasets consist of thousand of genes with relatively small sample sizes (i.e. are large-p-small-n). Moreover, dependencies of various orders coexist in the datasets. On the one hand transcription factor encoding genes act like hubs and regulate target genes, on the other hand target genes show local dependencies. In the field of Undirected Network Models (UNMs)—a subclass of PNMs—the Glasso algorithm has been proposed to deal with high dimensional microarray datasets forcing sparsity. To overcome the problem of the complex structure of interactions, modifications of the default Glasso algorithm have been developed that integrate the expected dependency structure in the UNMs beforehand. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian networks leaning on directed acyclic graphs. We compare HC with Glasso and variants in the UNM framework based on their capability to reconstruct GRNs from microarray data from the benchmarking synthetic dataset from the DREAM5 challenge and from real-world data from the Escherichia coli genome. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance. Nature Publishing Group UK 2022-11-04 /pmc/articles/PMC9636198/ /pubmed/36333425 http://dx.doi.org/10.1038/s41598-022-21957-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Graafland, Catharina E.
Gutiérrez, José M.
Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_full Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_fullStr Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_full_unstemmed Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_short Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks
title_sort learning complex dependency structure of gene regulatory networks from high dimensional microarray data with gaussian bayesian networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9636198/
https://www.ncbi.nlm.nih.gov/pubmed/36333425
http://dx.doi.org/10.1038/s41598-022-21957-z
work_keys_str_mv AT graaflandcatharinae learningcomplexdependencystructureofgeneregulatorynetworksfromhighdimensionalmicroarraydatawithgaussianbayesiannetworks
AT gutierrezjosem learningcomplexdependencystructureofgeneregulatorynetworksfromhighdimensionalmicroarraydatawithgaussianbayesiannetworks