Cargando…
SEMgsa: topology-based pathway enrichment analysis with structural equation models
BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler me...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9385099/ https://www.ncbi.nlm.nih.gov/pubmed/35978279 http://dx.doi.org/10.1186/s12859-022-04884-8 |
_version_ | 1784769525029273600 |
---|---|
author | Grassi, Mario Tarantino, Barbara |
author_facet | Grassi, Mario Tarantino, Barbara |
author_sort | Grassi, Mario |
collection | PubMed |
description | BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. RESULTS: We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values ([Formula: see text] ) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. CONCLUSIONS: SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04884-8. |
format | Online Article Text |
id | pubmed-9385099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-93850992022-08-18 SEMgsa: topology-based pathway enrichment analysis with structural equation models Grassi, Mario Tarantino, Barbara BMC Bioinformatics Software BACKGROUND: Pathway enrichment analysis is extensively used in high-throughput experimental studies to gain insight into the functional roles of pre-defined subsets of genes, proteins and metabolites. Methods that leverages information on the topology of the underlying pathways outperform simpler methods that only consider pathway membership, leading to improved performance. Among all the proposed software tools, there’s the need to combine high statistical power together with a user-friendly framework, making it difficult to choose the best method for a particular experimental environment. RESULTS: We propose SEMgsa, a topology-based algorithm developed into the framework of structural equation models. SEMgsa combine the SEM p values regarding node-specific group effect estimates in terms of activation or inhibition, after statistically controlling biological relations among genes within pathways. We used SEMgsa to identify biologically relevant results in a Coronavirus disease (COVID-19) RNA-seq dataset (GEO accession: GSE172114) together with a frontotemporal dementia (FTD) DNA methylation dataset (GEO accession: GSE53740) and compared its performance with some existing methods. SEMgsa is highly sensitive to the pathways designed for the specific disease, showing low p values ([Formula: see text] ) and ranking in high positions, outperforming existing software tools. Three pathway dysregulation mechanisms were used to generate simulated expression data and evaluate the performance of methods in terms of type I error followed by their statistical power. Simulation results confirm best overall performance of SEMgsa. CONCLUSIONS: SEMgsa is a novel yet powerful method for identifying enrichment with regard to gene expression data. It takes into account topological information and exploits pathway perturbation statistics to reveal biological information. SEMgsa is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04884-8. BioMed Central 2022-08-17 /pmc/articles/PMC9385099/ /pubmed/35978279 http://dx.doi.org/10.1186/s12859-022-04884-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Grassi, Mario Tarantino, Barbara SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title | SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title_full | SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title_fullStr | SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title_full_unstemmed | SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title_short | SEMgsa: topology-based pathway enrichment analysis with structural equation models |
title_sort | semgsa: topology-based pathway enrichment analysis with structural equation models |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9385099/ https://www.ncbi.nlm.nih.gov/pubmed/35978279 http://dx.doi.org/10.1186/s12859-022-04884-8 |
work_keys_str_mv | AT grassimario semgsatopologybasedpathwayenrichmentanalysiswithstructuralequationmodels AT tarantinobarbara semgsatopologybasedpathwayenrichmentanalysiswithstructuralequationmodels |