Cargando…
Linear functional organization of the omic embedding space
MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570782/ https://www.ncbi.nlm.nih.gov/pubmed/34213534 http://dx.doi.org/10.1093/bioinformatics/btab487 |
_version_ | 1784594890977443840 |
---|---|
author | Xenos, A Malod-Dognin, N Milinković, S Pržulj, N |
author_facet | Xenos, A Malod-Dognin, N Milinković, S Pržulj, N |
author_sort | Xenos, A |
collection | PubMed |
description | MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein–protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. RESULTS: We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. AVAILABILITY AND IMPLEMENTATION: Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8570782 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85707822021-11-08 Linear functional organization of the omic embedding space Xenos, A Malod-Dognin, N Milinković, S Pržulj, N Bioinformatics Original Papers MOTIVATION: We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein–protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. RESULTS: We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. AVAILABILITY AND IMPLEMENTATION: Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-02 /pmc/articles/PMC8570782/ /pubmed/34213534 http://dx.doi.org/10.1093/bioinformatics/btab487 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Xenos, A Malod-Dognin, N Milinković, S Pržulj, N Linear functional organization of the omic embedding space |
title | Linear functional organization of the omic embedding space |
title_full | Linear functional organization of the omic embedding space |
title_fullStr | Linear functional organization of the omic embedding space |
title_full_unstemmed | Linear functional organization of the omic embedding space |
title_short | Linear functional organization of the omic embedding space |
title_sort | linear functional organization of the omic embedding space |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570782/ https://www.ncbi.nlm.nih.gov/pubmed/34213534 http://dx.doi.org/10.1093/bioinformatics/btab487 |
work_keys_str_mv | AT xenosa linearfunctionalorganizationoftheomicembeddingspace AT maloddogninn linearfunctionalorganizationoftheomicembeddingspace AT milinkovics linearfunctionalorganizationoftheomicembeddingspace AT przuljn linearfunctionalorganizationoftheomicembeddingspace |