Cargando…

Linear functional organization of the omic embedding space

MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network...

Descripción completa

Detalles Bibliográficos
Autores principales: Xenos, A, Malod-Dognin, N, Milinković, S, Pržulj, N
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570782/
https://www.ncbi.nlm.nih.gov/pubmed/34213534
http://dx.doi.org/10.1093/bioinformatics/btab487
_version_ 1784594890977443840
author Xenos, A
Malod-Dognin, N
Milinković, S
Pržulj, N
author_facet Xenos, A
Malod-Dognin, N
Milinković, S
Pržulj, N
author_sort Xenos, A
collection PubMed
description MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein–protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. RESULTS: We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. AVAILABILITY AND IMPLEMENTATION: Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8570782
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85707822021-11-08 Linear functional organization of the omic embedding space Xenos, A Malod-Dognin, N Milinković, S Pržulj, N Bioinformatics Original Papers MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein–protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. RESULTS: We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. AVAILABILITY AND IMPLEMENTATION: Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-02 /pmc/articles/PMC8570782/ /pubmed/34213534 http://dx.doi.org/10.1093/bioinformatics/btab487 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Xenos, A
Malod-Dognin, N
Milinković, S
Pržulj, N
Linear functional organization of the omic embedding space
title Linear functional organization of the omic embedding space
title_full Linear functional organization of the omic embedding space
title_fullStr Linear functional organization of the omic embedding space
title_full_unstemmed Linear functional organization of the omic embedding space
title_short Linear functional organization of the omic embedding space
title_sort linear functional organization of the omic embedding space
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570782/
https://www.ncbi.nlm.nih.gov/pubmed/34213534
http://dx.doi.org/10.1093/bioinformatics/btab487
work_keys_str_mv AT xenosa linearfunctionalorganizationoftheomicembeddingspace
AT maloddogninn linearfunctionalorganizationoftheomicembeddingspace
AT milinkovics linearfunctionalorganizationoftheomicembeddingspace
AT przuljn linearfunctionalorganizationoftheomicembeddingspace