Cargando…

Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework

BACKGROUND: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Lingjian, Ainali, Chrysanthi, Tsoka, Sophia, Papageorgiou, Lazaros G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269079/
https://www.ncbi.nlm.nih.gov/pubmed/25475756
http://dx.doi.org/10.1186/s12859-014-0390-2
_version_ 1782349317093195776
author Yang, Lingjian
Ainali, Chrysanthi
Tsoka, Sophia
Papageorgiou, Lazaros G
author_facet Yang, Lingjian
Ainali, Chrysanthi
Tsoka, Sophia
Papageorgiou, Lazaros G
author_sort Yang, Lingjian
collection PubMed
description BACKGROUND: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. RESULTS: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. CONCLUSIONS: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0390-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4269079
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42690792014-12-18 Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework Yang, Lingjian Ainali, Chrysanthi Tsoka, Sophia Papageorgiou, Lazaros G BMC Bioinformatics Research Article BACKGROUND: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. RESULTS: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. CONCLUSIONS: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0390-2) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-05 /pmc/articles/PMC4269079/ /pubmed/25475756 http://dx.doi.org/10.1186/s12859-014-0390-2 Text en © Yang et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Yang, Lingjian
Ainali, Chrysanthi
Tsoka, Sophia
Papageorgiou, Lazaros G
Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title_full Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title_fullStr Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title_full_unstemmed Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title_short Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
title_sort pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269079/
https://www.ncbi.nlm.nih.gov/pubmed/25475756
http://dx.doi.org/10.1186/s12859-014-0390-2
work_keys_str_mv AT yanglingjian pathwayactivityinferenceformulticlassdiseaseclassificationthroughamathematicalprogrammingoptimisationframework
AT ainalichrysanthi pathwayactivityinferenceformulticlassdiseaseclassificationthroughamathematicalprogrammingoptimisationframework
AT tsokasophia pathwayactivityinferenceformulticlassdiseaseclassificationthroughamathematicalprogrammingoptimisationframework
AT papageorgioulazarosg pathwayactivityinferenceformulticlassdiseaseclassificationthroughamathematicalprogrammingoptimisationframework