Cargando…

Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature

MOTIVATION: We explore the use of literature-curated signed causal gene expression and gene–function relationships to construct unsupervised embeddings of genes, biological functions and diseases. Our goal is to prioritize and predict activating and inhibiting functional associations of genes and to...

Descripción completa

Detalles Bibliográficos
Autores principales: Krämer, Andreas, Green, Jeff, Billaud, Jean-Noël, Pasare, Nicoleta Andreea, Jones, Martin, Tugendreich, Stuart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710590/
https://www.ncbi.nlm.nih.gov/pubmed/36699407
http://dx.doi.org/10.1093/bioadv/vbac022
_version_ 1784841399490838528
author Krämer, Andreas
Green, Jeff
Billaud, Jean-Noël
Pasare, Nicoleta Andreea
Jones, Martin
Tugendreich, Stuart
author_facet Krämer, Andreas
Green, Jeff
Billaud, Jean-Noël
Pasare, Nicoleta Andreea
Jones, Martin
Tugendreich, Stuart
author_sort Krämer, Andreas
collection PubMed
description MOTIVATION: We explore the use of literature-curated signed causal gene expression and gene–function relationships to construct unsupervised embeddings of genes, biological functions and diseases. Our goal is to prioritize and predict activating and inhibiting functional associations of genes and to discover hidden relationships between functions. As an application, we are particularly interested in the automatic construction of networks that capture relevant biology in a given disease context. RESULTS: We evaluated several unsupervised gene embedding models leveraging literature-curated signed causal gene expression findings. Using linear regression, we show that, based on these gene embeddings, gene–function relationships can be predicted with about 95% precision for the highest scoring genes. Function embedding vectors, derived from parameters of the linear regression model, allow inference of relationships between different functions or diseases. We show for several diseases that gene and function embeddings can be used to recover key drivers of pathogenesis, as well as underlying cellular and physiological processes. These results are presented as disease-centric networks of genes and functions. To illustrate the applicability of our approach to other machine learning tasks, we also computed embeddings for drug molecules, which were then tested using a simple neural network to predict drug–disease associations. AVAILABILITY AND IMPLEMENTATION: Python implementations of the gene and function embedding algorithms operating on a subset of our literature-curated content as well as other code used for this paper are made available as part of the Supplementary data. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710590
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97105902023-01-24 Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature Krämer, Andreas Green, Jeff Billaud, Jean-Noël Pasare, Nicoleta Andreea Jones, Martin Tugendreich, Stuart Bioinform Adv Original Paper MOTIVATION: We explore the use of literature-curated signed causal gene expression and gene–function relationships to construct unsupervised embeddings of genes, biological functions and diseases. Our goal is to prioritize and predict activating and inhibiting functional associations of genes and to discover hidden relationships between functions. As an application, we are particularly interested in the automatic construction of networks that capture relevant biology in a given disease context. RESULTS: We evaluated several unsupervised gene embedding models leveraging literature-curated signed causal gene expression findings. Using linear regression, we show that, based on these gene embeddings, gene–function relationships can be predicted with about 95% precision for the highest scoring genes. Function embedding vectors, derived from parameters of the linear regression model, allow inference of relationships between different functions or diseases. We show for several diseases that gene and function embeddings can be used to recover key drivers of pathogenesis, as well as underlying cellular and physiological processes. These results are presented as disease-centric networks of genes and functions. To illustrate the applicability of our approach to other machine learning tasks, we also computed embeddings for drug molecules, which were then tested using a simple neural network to predict drug–disease associations. AVAILABILITY AND IMPLEMENTATION: Python implementations of the gene and function embedding algorithms operating on a subset of our literature-curated content as well as other code used for this paper are made available as part of the Supplementary data. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-04-07 /pmc/articles/PMC9710590/ /pubmed/36699407 http://dx.doi.org/10.1093/bioadv/vbac022 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Krämer, Andreas
Green, Jeff
Billaud, Jean-Noël
Pasare, Nicoleta Andreea
Jones, Martin
Tugendreich, Stuart
Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title_full Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title_fullStr Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title_full_unstemmed Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title_short Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
title_sort mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710590/
https://www.ncbi.nlm.nih.gov/pubmed/36699407
http://dx.doi.org/10.1093/bioadv/vbac022
work_keys_str_mv AT kramerandreas mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature
AT greenjeff mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature
AT billaudjeannoel mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature
AT pasarenicoletaandreea mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature
AT jonesmartin mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature
AT tugendreichstuart mininghiddenknowledgeembeddingmodelsofcauseeffectrelationshipscuratedfromthebiomedicalliterature