Cargando…

Gaussian Embedding for Large-scale Gene Set Analysis

Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available m...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Sheng, Flynn, Emily R., Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7505077/
https://www.ncbi.nlm.nih.gov/pubmed/32968711
http://dx.doi.org/10.1038/s42256-020-0193-2
_version_ 1783584746053304320
author Wang, Sheng
Flynn, Emily R.
Altman, Russ B.
author_facet Wang, Sheng
Flynn, Emily R.
Altman, Russ B.
author_sort Wang, Sheng
collection PubMed
description Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumors, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a previously unknown clinical prognostic and predictive subnetwork around NEFM in sarcoma, which we validate in independent cohorts.
format Online
Article
Text
id pubmed-7505077
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-75050772021-01-01 Gaussian Embedding for Large-scale Gene Set Analysis Wang, Sheng Flynn, Emily R. Altman, Russ B. Nat Mach Intell Article Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumors, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a previously unknown clinical prognostic and predictive subnetwork around NEFM in sarcoma, which we validate in independent cohorts. 2020-06-15 2020-07 /pmc/articles/PMC7505077/ /pubmed/32968711 http://dx.doi.org/10.1038/s42256-020-0193-2 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Wang, Sheng
Flynn, Emily R.
Altman, Russ B.
Gaussian Embedding for Large-scale Gene Set Analysis
title Gaussian Embedding for Large-scale Gene Set Analysis
title_full Gaussian Embedding for Large-scale Gene Set Analysis
title_fullStr Gaussian Embedding for Large-scale Gene Set Analysis
title_full_unstemmed Gaussian Embedding for Large-scale Gene Set Analysis
title_short Gaussian Embedding for Large-scale Gene Set Analysis
title_sort gaussian embedding for large-scale gene set analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7505077/
https://www.ncbi.nlm.nih.gov/pubmed/32968711
http://dx.doi.org/10.1038/s42256-020-0193-2
work_keys_str_mv AT wangsheng gaussianembeddingforlargescalegenesetanalysis
AT flynnemilyr gaussianembeddingforlargescalegenesetanalysis
AT altmanrussb gaussianembeddingforlargescalegenesetanalysis