Cargando…

Optimisation Models for Pathway Activity Inference in Cancer

SIMPLE SUMMARY: Subtype classification and prognostic prediction are key research targets in complex diseases such as cancer. In this work, an optimisation model was designed to infer the activity of biological pathways from gene expression values. The optimisation model enables the pathway activity...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yongnan, Liu, Songsong, Papageorgiou, Lazaros G., Theofilatos, Konstantinos, Tsoka, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046797/
https://www.ncbi.nlm.nih.gov/pubmed/36980673
http://dx.doi.org/10.3390/cancers15061787
_version_ 1785013762992898048
author Chen, Yongnan
Liu, Songsong
Papageorgiou, Lazaros G.
Theofilatos, Konstantinos
Tsoka, Sophia
author_facet Chen, Yongnan
Liu, Songsong
Papageorgiou, Lazaros G.
Theofilatos, Konstantinos
Tsoka, Sophia
author_sort Chen, Yongnan
collection PubMed
description SIMPLE SUMMARY: Subtype classification and prognostic prediction are key research targets in complex diseases such as cancer. In this work, an optimisation model was designed to infer the activity of biological pathways from gene expression values. The optimisation model enables the pathway activity values to separate the sample subtypes to the greatest extent, thereby improving sample classification accuracy. The proposed model was evaluated on cancer molecular subtype classification, robustness to noisy data and survival prediction, and allowed the identification of disease-important genes and pathways. ABSTRACT: Background: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. Methodology: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. Results: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction.
format Online
Article
Text
id pubmed-10046797
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100467972023-03-29 Optimisation Models for Pathway Activity Inference in Cancer Chen, Yongnan Liu, Songsong Papageorgiou, Lazaros G. Theofilatos, Konstantinos Tsoka, Sophia Cancers (Basel) Article SIMPLE SUMMARY: Subtype classification and prognostic prediction are key research targets in complex diseases such as cancer. In this work, an optimisation model was designed to infer the activity of biological pathways from gene expression values. The optimisation model enables the pathway activity values to separate the sample subtypes to the greatest extent, thereby improving sample classification accuracy. The proposed model was evaluated on cancer molecular subtype classification, robustness to noisy data and survival prediction, and allowed the identification of disease-important genes and pathways. ABSTRACT: Background: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. Methodology: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. Results: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction. MDPI 2023-03-15 /pmc/articles/PMC10046797/ /pubmed/36980673 http://dx.doi.org/10.3390/cancers15061787 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Yongnan
Liu, Songsong
Papageorgiou, Lazaros G.
Theofilatos, Konstantinos
Tsoka, Sophia
Optimisation Models for Pathway Activity Inference in Cancer
title Optimisation Models for Pathway Activity Inference in Cancer
title_full Optimisation Models for Pathway Activity Inference in Cancer
title_fullStr Optimisation Models for Pathway Activity Inference in Cancer
title_full_unstemmed Optimisation Models for Pathway Activity Inference in Cancer
title_short Optimisation Models for Pathway Activity Inference in Cancer
title_sort optimisation models for pathway activity inference in cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046797/
https://www.ncbi.nlm.nih.gov/pubmed/36980673
http://dx.doi.org/10.3390/cancers15061787
work_keys_str_mv AT chenyongnan optimisationmodelsforpathwayactivityinferenceincancer
AT liusongsong optimisationmodelsforpathwayactivityinferenceincancer
AT papageorgioulazarosg optimisationmodelsforpathwayactivityinferenceincancer
AT theofilatoskonstantinos optimisationmodelsforpathwayactivityinferenceincancer
AT tsokasophia optimisationmodelsforpathwayactivityinferenceincancer