Cargando…

SEMgsa: topology-based pathway enrichment analysis with structural equation models

BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler me...

Descripción completa

Detalles Bibliográficos
Autores principales: Grassi, Mario, Tarantino, Barbara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9385099/
https://www.ncbi.nlm.nih.gov/pubmed/35978279
http://dx.doi.org/10.1186/s12859-022-04884-8
_version_ 1784769525029273600
author Grassi, Mario
Tarantino, Barbara
author_facet Grassi, Mario
Tarantino, Barbara
author_sort Grassi, Mario
collection PubMed
description BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. RESULTS: We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values ([Formula: see text] ) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. CONCLUSIONS: SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04884-8.
format Online
Article
Text
id pubmed-9385099
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-93850992022-08-18 SEMgsa: topology-based pathway enrichment analysis with structural equation models Grassi, Mario Tarantino, Barbara BMC Bioinformatics Software BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. RESULTS: We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values ([Formula: see text] ) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. CONCLUSIONS: SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04884-8. BioMed Central 2022-08-17 /pmc/articles/PMC9385099/ /pubmed/35978279 http://dx.doi.org/10.1186/s12859-022-04884-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Grassi, Mario
Tarantino, Barbara
SEMgsa: topology-based pathway enrichment analysis with structural equation models
title SEMgsa: topology-based pathway enrichment analysis with structural equation models
title_full SEMgsa: topology-based pathway enrichment analysis with structural equation models
title_fullStr SEMgsa: topology-based pathway enrichment analysis with structural equation models
title_full_unstemmed SEMgsa: topology-based pathway enrichment analysis with structural equation models
title_short SEMgsa: topology-based pathway enrichment analysis with structural equation models
title_sort semgsa: topology-based pathway enrichment analysis with structural equation models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9385099/
https://www.ncbi.nlm.nih.gov/pubmed/35978279
http://dx.doi.org/10.1186/s12859-022-04884-8
work_keys_str_mv AT grassimario semgsatopologybasedpathwayenrichmentanalysiswithstructuralequationmodels
AT tarantinobarbara semgsatopologybasedpathwayenrichmentanalysiswithstructuralequationmodels