Cargando…
Ensembles of knowledge graph embedding models improve predictions for drug discovery
Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at leas...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677479/ https://www.ncbi.nlm.nih.gov/pubmed/36384050 http://dx.doi.org/10.1093/bib/bbac481 |
_version_ | 1784833820186378240 |
---|---|
author | Rivas-Barragan, Daniel Domingo-Fernández, Daniel Gadiya, Yojana Healey, David |
author_facet | Rivas-Barragan, Daniel Domingo-Fernández, Daniel Gadiya, Yojana Healey, David |
author_sort | Rivas-Barragan, Daniel |
collection | PubMed |
description | Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug–disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug–disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery. |
format | Online Article Text |
id | pubmed-9677479 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-96774792022-11-21 Ensembles of knowledge graph embedding models improve predictions for drug discovery Rivas-Barragan, Daniel Domingo-Fernández, Daniel Gadiya, Yojana Healey, David Brief Bioinform Problem Solving Protocol Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug–disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug–disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery. Oxford University Press 2022-11-16 /pmc/articles/PMC9677479/ /pubmed/36384050 http://dx.doi.org/10.1093/bib/bbac481 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Rivas-Barragan, Daniel Domingo-Fernández, Daniel Gadiya, Yojana Healey, David Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title | Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title_full | Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title_fullStr | Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title_full_unstemmed | Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title_short | Ensembles of knowledge graph embedding models improve predictions for drug discovery |
title_sort | ensembles of knowledge graph embedding models improve predictions for drug discovery |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677479/ https://www.ncbi.nlm.nih.gov/pubmed/36384050 http://dx.doi.org/10.1093/bib/bbac481 |
work_keys_str_mv | AT rivasbarragandaniel ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery AT domingofernandezdaniel ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery AT gadiyayojana ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery AT healeydavid ensemblesofknowledgegraphembeddingmodelsimprovepredictionsfordrugdiscovery |