Cargando…

PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach

BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism...

Descripción completa

Detalles Bibliográficos
Autores principales: Yousef, Malik, Ozdemir, Fatma, Jaber, Amhar, Allmer, Jens, Bakir-Gungor, Burcu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947447/
https://www.ncbi.nlm.nih.gov/pubmed/36823571
http://dx.doi.org/10.1186/s12859-023-05187-2
_version_ 1784892556091326464
author Yousef, Malik
Ozdemir, Fatma
Jaber, Amhar
Allmer, Jens
Bakir-Gungor, Burcu
author_facet Yousef, Malik
Ozdemir, Fatma
Jaber, Amhar
Allmer, Jens
Bakir-Gungor, Burcu
author_sort Yousef, Malik
collection PubMed
description BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS: PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS: PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05187-2.
format Online
Article
Text
id pubmed-9947447
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99474472023-02-23 PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach Yousef, Malik Ozdemir, Fatma Jaber, Amhar Allmer, Jens Bakir-Gungor, Burcu BMC Bioinformatics Research Article BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS: PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS: PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05187-2. BioMed Central 2023-02-23 /pmc/articles/PMC9947447/ /pubmed/36823571 http://dx.doi.org/10.1186/s12859-023-05187-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Yousef, Malik
Ozdemir, Fatma
Jaber, Amhar
Allmer, Jens
Bakir-Gungor, Burcu
PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_full PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_fullStr PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_full_unstemmed PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_short PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
title_sort pripath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9947447/
https://www.ncbi.nlm.nih.gov/pubmed/36823571
http://dx.doi.org/10.1186/s12859-023-05187-2
work_keys_str_mv AT yousefmalik pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT ozdemirfatma pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT jaberamhar pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT allmerjens pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach
AT bakirgungorburcu pripathidentifyingdysregulatedpathwaysfromdifferentialgeneexpressionviagroupingscoringandmodelingwithanembeddedfeatureselectionapproach