Cargando…

CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis

Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically m...

Descripción completa

Detalles Bibliográficos
Autores principales: Yousef, Malik, Ülgen, Ege, Uğur Sezerman, Osman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959595/
https://www.ncbi.nlm.nih.gov/pubmed/33816987
http://dx.doi.org/10.7717/peerj-cs.336
_version_ 1783664983504060416
author Yousef, Malik
Ülgen, Ege
Uğur Sezerman, Osman
author_facet Yousef, Malik
Ülgen, Ege
Uğur Sezerman, Osman
author_sort Yousef, Malik
collection PubMed
description Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases.
format Online
Article
Text
id pubmed-7959595
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79595952021-04-02 CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis Yousef, Malik Ülgen, Ege Uğur Sezerman, Osman PeerJ Comput Sci Bioinformatics Most of the traditional gene selection approaches are borrowed from other fields such as statistics and computer science, However, they do not prioritize biologically relevant genes since the ultimate goal is to determine features that optimize model performance metrics not to build a biologically meaningful model. Therefore, there is an imminent need for new computational tools that integrate the biological knowledge about the data in the process of gene selection and machine learning. Integrative gene selection enables incorporation of biological domain knowledge from external biological resources. In this study, we propose a new computational approach named CogNet that is an integrative gene selection tool that exploits biological knowledge for grouping the genes for the computational modeling tasks of ranking and classification. In CogNet, the pathfindR serves as the biological grouping tool to allow the main algorithm to rank active-subnetwork-oriented KEGG pathway enrichment analysis results to build a biologically relevant model. CogNet provides a list of significant KEGG pathways that can classify the data with a very high accuracy. The list also provides the genes belonging to these pathways that are differentially expressed that are used as features in the classification problem. The list facilitates deep analysis and better interpretability of the role of KEGG pathways in classification of the data thus better establishing the biological relevance of these differentially expressed genes. Even though the main aim of our study is not to improve the accuracy of any existing tool, the performance of the CogNet outperforms a similar approach called maTE while obtaining similar performance compared to other similar tools including SVM-RCE. CogNet was tested on 13 gene expression datasets concerning a variety of diseases. PeerJ Inc. 2021-02-22 /pmc/articles/PMC7959595/ /pubmed/33816987 http://dx.doi.org/10.7717/peerj-cs.336 Text en © 2021 Yousef et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Yousef, Malik
Ülgen, Ege
Uğur Sezerman, Osman
CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_full CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_fullStr CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_full_unstemmed CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_short CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis
title_sort cognet: classification of gene expression data based on ranked active-subnetwork-oriented kegg pathway enrichment analysis
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959595/
https://www.ncbi.nlm.nih.gov/pubmed/33816987
http://dx.doi.org/10.7717/peerj-cs.336
work_keys_str_mv AT yousefmalik cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis
AT ulgenege cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis
AT ugursezermanosman cognetclassificationofgeneexpressiondatabasedonrankedactivesubnetworkorientedkeggpathwayenrichmentanalysis