Cargando…

HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball

MOTIVATION: Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jaesik, Kim, Dokyoon, Sohn, Kyung-Ah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060726/
https://www.ncbi.nlm.nih.gov/pubmed/33760022
http://dx.doi.org/10.1093/bioinformatics/btab193
_version_ 1785017142534471680
author Kim, Jaesik
Kim, Dokyoon
Sohn, Kyung-Ah
author_facet Kim, Jaesik
Kim, Dokyoon
Sohn, Kyung-Ah
author_sort Kim, Jaesik
collection PubMed
description MOTIVATION: Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. RESULTS: In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. AVAILABILITYAND IMPLEMENTATION: https://github.com/JaesikKim/HiG2Vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-10060726
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-100607262023-03-31 HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball Kim, Jaesik Kim, Dokyoon Sohn, Kyung-Ah Bioinformatics Original Papers MOTIVATION: Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. RESULTS: In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. AVAILABILITYAND IMPLEMENTATION: https://github.com/JaesikKim/HiG2Vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-03-24 /pmc/articles/PMC10060726/ /pubmed/33760022 http://dx.doi.org/10.1093/bioinformatics/btab193 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Kim, Jaesik
Kim, Dokyoon
Sohn, Kyung-Ah
HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title_full HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title_fullStr HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title_full_unstemmed HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title_short HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball
title_sort hig2vec: hierarchical representations of gene ontology and genes in the poincaré ball
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060726/
https://www.ncbi.nlm.nih.gov/pubmed/33760022
http://dx.doi.org/10.1093/bioinformatics/btab193
work_keys_str_mv AT kimjaesik hig2vechierarchicalrepresentationsofgeneontologyandgenesinthepoincareball
AT kimdokyoon hig2vechierarchicalrepresentationsofgeneontologyandgenesinthepoincareball
AT sohnkyungah hig2vechierarchicalrepresentationsofgeneontologyandgenesinthepoincareball