Cargando…

Evolving knowledge graph similarity for supervised learning in complex biomedical domains

BACKGROUND: In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full...

Descripción completa

Detalles Bibliográficos
Autores principales: Sousa, Rita T., Silva, Sara, Pesquita, Catia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6942314/
https://www.ncbi.nlm.nih.gov/pubmed/31900127
http://dx.doi.org/10.1186/s12859-019-3296-1
_version_ 1783484678809845760
author Sousa, Rita T.
Silva, Sara
Pesquita, Catia
author_facet Sousa, Rita T.
Silva, Sara
Pesquita, Catia
author_sort Sousa, Rita T.
collection PubMed
description BACKGROUND: In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full underlying semantics. An alternative is to use machine learning approaches that explore semantic similarity. However, since ontologies can model multiple perspectives, semantic similarity computations for a given learning task need to be fine-tuned to account for this. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge. RESULTS: We have developed a novel approach, evoKGsim, that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task. The approach was evaluated on several benchmark datasets for protein-protein interaction prediction using the Gene Ontology as the knowledge graph to support semantic similarity, and it outperformed competing strategies, including manually selected combinations of semantic aspects emulating expert knowledge. evoKGsim was also able to learn species-agnostic models with different combinations of species for training and testing, effectively addressing the limitations of predicting protein-protein interactions for species with fewer known interactions. CONCLUSIONS: evoKGsim can overcome one of the limitations in knowledge graph-based semantic similarity applications: the need to expertly select which aspects should be taken into account for a given application. Applying this methodology to protein-protein interaction prediction proved successful, paving the way to broader applications.
format Online
Article
Text
id pubmed-6942314
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69423142020-01-07 Evolving knowledge graph similarity for supervised learning in complex biomedical domains Sousa, Rita T. Silva, Sara Pesquita, Catia BMC Bioinformatics Research Article BACKGROUND: In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full underlying semantics. An alternative is to use machine learning approaches that explore semantic similarity. However, since ontologies can model multiple perspectives, semantic similarity computations for a given learning task need to be fine-tuned to account for this. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge. RESULTS: We have developed a novel approach, evoKGsim, that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task. The approach was evaluated on several benchmark datasets for protein-protein interaction prediction using the Gene Ontology as the knowledge graph to support semantic similarity, and it outperformed competing strategies, including manually selected combinations of semantic aspects emulating expert knowledge. evoKGsim was also able to learn species-agnostic models with different combinations of species for training and testing, effectively addressing the limitations of predicting protein-protein interactions for species with fewer known interactions. CONCLUSIONS: evoKGsim can overcome one of the limitations in knowledge graph-based semantic similarity applications: the need to expertly select which aspects should be taken into account for a given application. Applying this methodology to protein-protein interaction prediction proved successful, paving the way to broader applications. BioMed Central 2020-01-03 /pmc/articles/PMC6942314/ /pubmed/31900127 http://dx.doi.org/10.1186/s12859-019-3296-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Sousa, Rita T.
Silva, Sara
Pesquita, Catia
Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title_full Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title_fullStr Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title_full_unstemmed Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title_short Evolving knowledge graph similarity for supervised learning in complex biomedical domains
title_sort evolving knowledge graph similarity for supervised learning in complex biomedical domains
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6942314/
https://www.ncbi.nlm.nih.gov/pubmed/31900127
http://dx.doi.org/10.1186/s12859-019-3296-1
work_keys_str_mv AT sousaritat evolvingknowledgegraphsimilarityforsupervisedlearningincomplexbiomedicaldomains
AT silvasara evolvingknowledgegraphsimilarityforsupervisedlearningincomplexbiomedicaldomains
AT pesquitacatia evolvingknowledgegraphsimilarityforsupervisedlearningincomplexbiomedicaldomains