Cargando…

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases

MOTIVATION: Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the n...

Descripción completa

Detalles Bibliográficos
Autores principales:	Scherer, Paul, Trębacz, Maja, Simidjievski, Nikola, Viñas, Ramon, Shams, Zohreh, Terre, Helena Andres, Jamnik, Mateja, Liò, Pietro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826027/ https://www.ncbi.nlm.nih.gov/pubmed/34888618 http://dx.doi.org/10.1093/bioinformatics/btab830

_version_	1784647347812171776
author	Scherer, Paul Trębacz, Maja Simidjievski, Nikola Viñas, Ramon Shams, Zohreh Terre, Helena Andres Jamnik, Mateja Liò, Pietro
author_facet	Scherer, Paul Trębacz, Maja Simidjievski, Nikola Viñas, Ramon Shams, Zohreh Terre, Helena Andres Jamnik, Mateja Liò, Pietro
author_sort	Scherer, Paul
collection	PubMed
description	MOTIVATION: Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein–protein interaction (PPI) networks to guide the construction of predictive models. RESULTS: We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION: https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8826027
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-88260272022-02-09 Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases Scherer, Paul Trębacz, Maja Simidjievski, Nikola Viñas, Ramon Shams, Zohreh Terre, Helena Andres Jamnik, Mateja Liò, Pietro Bioinformatics Original Papers MOTIVATION: Gene expression data are commonly used at the intersection of cancer research and machine learning for better understanding of the molecular status of tumour tissue. Deep learning predictive models have been employed for gene expression data due to their ability to scale and remove the need for manual feature engineering. However, gene expression data are often very high dimensional, noisy and presented with a low number of samples. This poses significant problems for learning algorithms: models often overfit, learn noise and struggle to capture biologically relevant information. In this article, we utilize external biological knowledge embedded within structures of gene interaction graphs such as protein–protein interaction (PPI) networks to guide the construction of predictive models. RESULTS: We present Gene Interaction Network Constrained Construction (GINCCo), an unsupervised method for automated construction of computational graph models for gene expression data that are structurally constrained by prior knowledge of gene interaction networks. We employ this methodology in a case study on incorporating a PPI network in cancer phenotype prediction tasks. Our computational graphs are structurally constructed using topological clustering algorithms on the PPI networks which incorporate inductive biases stemming from network biology research on protein complex discovery. Each of the entities in the GINCCo computational graph represents biological entities such as genes, candidate protein complexes and phenotypes instead of arbitrary hidden nodes of a neural network. This provides a biologically relevant mechanism for model regularization yielding strong predictive performance while drastically reducing the number of model parameters and enabling guided post-hoc enrichment analyses of influential gene sets with respect to target phenotypes. Our experiments analysing a variety of cancer phenotypes show that GINCCo often outperforms support vector machine, Fully Connected Multi-layer Perceptrons (MLP) and Randomly Connected MLPs despite greatly reduced model complexity. AVAILABILITY AND IMPLEMENTATION: https://github.com/paulmorio/gincco contains the source code for our approach. We also release a library with algorithms for protein complex discovery within PPI networks at https://github.com/paulmorio/protclus. This repository contains implementations of the clustering algorithms used in this article. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-12-09 /pmc/articles/PMC8826027/ /pubmed/34888618 http://dx.doi.org/10.1093/bioinformatics/btab830 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Scherer, Paul Trębacz, Maja Simidjievski, Nikola Viñas, Ramon Shams, Zohreh Terre, Helena Andres Jamnik, Mateja Liò, Pietro Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title	Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title_full	Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title_fullStr	Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title_full_unstemmed	Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title_short	Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
title_sort	unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826027/ https://www.ncbi.nlm.nih.gov/pubmed/34888618 http://dx.doi.org/10.1093/bioinformatics/btab830
work_keys_str_mv	AT schererpaul unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT trebaczmaja unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT simidjievskinikola unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT vinasramon unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT shamszohreh unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT terrehelenaandres unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT jamnikmateja unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases AT liopietro unsupervisedconstructionofcomputationalgraphsforgeneexpressiondatawithexplicitstructuralinductivebiases

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases

Ejemplares similares