Cargando…

Ensembles of knowledge graph embedding models improve predictions for drug discovery

Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at leas...

Descripción completa

Detalles Bibliográficos
Autores principales: Rivas-Barragan, Daniel, Domingo-Fernández, Daniel, Gadiya, Yojana, Healey, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677479/
https://www.ncbi.nlm.nih.gov/pubmed/36384050
http://dx.doi.org/10.1093/bib/bbac481
_version_ 1784833820186378240
author Rivas-Barragan, Daniel
Domingo-Fernández, Daniel
Gadiya, Yojana
Healey, David
author_facet Rivas-Barragan, Daniel
Domingo-Fernández, Daniel
Gadiya, Yojana
Healey, David
author_sort Rivas-Barragan, Daniel
collection PubMed
description Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug–disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug–disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.
format Online
Article
Text
id pubmed-9677479
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96774792022-11-21 Ensembles of knowledge graph embedding models improve predictions for drug discovery Rivas-Barragan, Daniel Domingo-Fernández, Daniel Gadiya, Yojana Healey, David Brief Bioinform Problem Solving Protocol Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug–disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug–disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery. Oxford University Press 2022-11-16 /pmc/articles/PMC9677479/ /pubmed/36384050 http://dx.doi.org/10.1093/bib/bbac481 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Rivas-Barragan, Daniel
Domingo-Fernández, Daniel
Gadiya, Yojana
Healey, David
Ensembles of knowledge graph embedding models improve predictions for drug discovery
title Ensembles of knowledge graph embedding models improve predictions for drug discovery
title_full Ensembles of knowledge graph embedding models improve predictions for drug discovery
title_fullStr Ensembles of knowledge graph embedding models improve predictions for drug discovery
title_full_unstemmed Ensembles of knowledge graph embedding models improve predictions for drug discovery
title_short Ensembles of knowledge graph embedding models improve predictions for drug discovery
title_sort ensembles of knowledge graph embedding models improve predictions for drug discovery
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677479/
https://www.ncbi.nlm.nih.gov/pubmed/36384050
http://dx.doi.org/10.1093/bib/bbac481
work_keys_str_mv AT rivasbarragandaniel ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery
AT domingofernandezdaniel ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery
AT gadiyayojana ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery
AT healeydavid ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery