Cargando…

SEMtree: tree-based structure learning methods with structural equation models

MOTIVATION: With the exponential growth of expression and protein–protein interaction (PPI) data, the identification of functional modules in PPI networks that show striking changes in molecular activity or phenotypic signatures becomes of particular interest to reveal process-specific information t...

Descripción completa

Detalles Bibliográficos
Autores principales: Grassi, Mario, Tarantino, Barbara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287946/
https://www.ncbi.nlm.nih.gov/pubmed/37294820
http://dx.doi.org/10.1093/bioinformatics/btad377
_version_ 1785061975370235904
author Grassi, Mario
Tarantino, Barbara
author_facet Grassi, Mario
Tarantino, Barbara
author_sort Grassi, Mario
collection PubMed
description MOTIVATION: With the exponential growth of expression and protein–protein interaction (PPI) data, the identification of functional modules in PPI networks that show striking changes in molecular activity or phenotypic signatures becomes of particular interest to reveal process-specific information that is correlated with cellular or disease states. This requires both the identification of network nodes with reliability scores and the availability of an efficient technique to locate the network regions with the highest scores. In the literature, a number of heuristic methods have been suggested. We propose SEMtree(), a set of tree-based structure discovery algorithms, combining graph and statistically interpretable parameters together with a user-friendly R package based on structural equation models framework. RESULTS: Condition-specific changes from differential expression and gene–gene co-expression are recovered with statistical testing of node, directed edge, and directed path difference between groups. In the end, from a list of seed (i.e. disease) genes or gene P-values, the perturbed modules with undirected edges are generated with five state-of-the-art active subnetwork detection methods. The latter are supplied to causal additive trees based on Chu–Liu–Edmonds’ algorithm (Chow and Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans Inform Theory 1968;14:462–7) in SEMtree() to be converted in directed trees. This conversion allows to compare the methods in terms of directed active subnetworks. We applied SEMtree() to both Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) and simulated datasets with various differential expression patterns. Compared to existing methods, SEMtree() is able to capture biologically relevant subnetworks with simple visualization of directed paths, good perturbation extraction, and classifier performance. AVAILABILITY AND IMPLEMENTATION: SEMtree() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.
format Online
Article
Text
id pubmed-10287946
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102879462023-06-24 SEMtree: tree-based structure learning methods with structural equation models Grassi, Mario Tarantino, Barbara Bioinformatics Original Paper MOTIVATION: With the exponential growth of expression and protein–protein interaction (PPI) data, the identification of functional modules in PPI networks that show striking changes in molecular activity or phenotypic signatures becomes of particular interest to reveal process-specific information that is correlated with cellular or disease states. This requires both the identification of network nodes with reliability scores and the availability of an efficient technique to locate the network regions with the highest scores. In the literature, a number of heuristic methods have been suggested. We propose SEMtree(), a set of tree-based structure discovery algorithms, combining graph and statistically interpretable parameters together with a user-friendly R package based on structural equation models framework. RESULTS: Condition-specific changes from differential expression and gene–gene co-expression are recovered with statistical testing of node, directed edge, and directed path difference between groups. In the end, from a list of seed (i.e. disease) genes or gene P-values, the perturbed modules with undirected edges are generated with five state-of-the-art active subnetwork detection methods. The latter are supplied to causal additive trees based on Chu–Liu–Edmonds’ algorithm (Chow and Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans Inform Theory 1968;14:462–7) in SEMtree() to be converted in directed trees. This conversion allows to compare the methods in terms of directed active subnetworks. We applied SEMtree() to both Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) and simulated datasets with various differential expression patterns. Compared to existing methods, SEMtree() is able to capture biologically relevant subnetworks with simple visualization of directed paths, good perturbation extraction, and classifier performance. AVAILABILITY AND IMPLEMENTATION: SEMtree() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. Oxford University Press 2023-06-09 /pmc/articles/PMC10287946/ /pubmed/37294820 http://dx.doi.org/10.1093/bioinformatics/btad377 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Grassi, Mario
Tarantino, Barbara
SEMtree: tree-based structure learning methods with structural equation models
title SEMtree: tree-based structure learning methods with structural equation models
title_full SEMtree: tree-based structure learning methods with structural equation models
title_fullStr SEMtree: tree-based structure learning methods with structural equation models
title_full_unstemmed SEMtree: tree-based structure learning methods with structural equation models
title_short SEMtree: tree-based structure learning methods with structural equation models
title_sort semtree: tree-based structure learning methods with structural equation models
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287946/
https://www.ncbi.nlm.nih.gov/pubmed/37294820
http://dx.doi.org/10.1093/bioinformatics/btad377
work_keys_str_mv AT grassimario semtreetreebasedstructurelearningmethodswithstructuralequationmodels
AT tarantinobarbara semtreetreebasedstructurelearningmethodswithstructuralequationmodels