Cargando…

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

MOTIVATION: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information c...

Descripción completa

Detalles Bibliográficos
Autores principales: Smaili, Fatima Zohra, Gao, Xin, Hoehndorf, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022543/
https://www.ncbi.nlm.nih.gov/pubmed/29949999
http://dx.doi.org/10.1093/bioinformatics/bty259
_version_ 1783335700802830336
author Smaili, Fatima Zohra
Gao, Xin
Hoehndorf, Robert
author_facet Smaili, Fatima Zohra
Gao, Xin
Hoehndorf, Robert
author_sort Smaili, Fatima Zohra
collection PubMed
description MOTIVATION: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. RESULTS: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein–protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/onto2vec SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022543
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225432018-07-10 Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations Smaili, Fatima Zohra Gao, Xin Hoehndorf, Robert Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications. RESULTS: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein–protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved. AVAILABILITY AND IMPLEMENTATION: https://github.com/bio-ontology-research-group/onto2vec SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022543/ /pubmed/29949999 http://dx.doi.org/10.1093/bioinformatics/bty259 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Smaili, Fatima Zohra
Gao, Xin
Hoehndorf, Robert
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title_full Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title_fullStr Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title_full_unstemmed Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title_short Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations
title_sort onto2vec: joint vector-based representation of biological entities and their ontology-based annotations
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022543/
https://www.ncbi.nlm.nih.gov/pubmed/29949999
http://dx.doi.org/10.1093/bioinformatics/bty259
work_keys_str_mv AT smailifatimazohra onto2vecjointvectorbasedrepresentationofbiologicalentitiesandtheirontologybasedannotations
AT gaoxin onto2vecjointvectorbasedrepresentationofbiologicalentitiesandtheirontologybasedannotations
AT hoehndorfrobert onto2vecjointvectorbasedrepresentationofbiologicalentitiesandtheirontologybasedannotations