Cargando…

Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling

MOTIVATION: The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Woicik, Addie, Zhang, Mingxin, Xu, Hanwen, Mostafavi, Sara, Wang, Sheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311345/
https://www.ncbi.nlm.nih.gov/pubmed/37387142
http://dx.doi.org/10.1093/bioinformatics/btad247
_version_ 1785066724089921536
author Woicik, Addie
Zhang, Mingxin
Xu, Hanwen
Mostafavi, Sara
Wang, Sheng
author_facet Woicik, Addie
Zhang, Mingxin
Xu, Hanwen
Mostafavi, Sara
Wang, Sheng
author_sort Woicik, Addie
collection PubMed
description MOTIVATION: The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS: To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in [Formula: see text] score, [Formula: see text] improvement in micro-AUPRC, and [Formula: see text] improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini’s performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings’ performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION: Gemini can be accessed at: https://github.com/MinxZ/Gemini.
format Online
Article
Text
id pubmed-10311345
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103113452023-07-01 Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling Woicik, Addie Zhang, Mingxin Xu, Hanwen Mostafavi, Sara Wang, Sheng Bioinformatics Systems Biology and Networks MOTIVATION: The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS: To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in [Formula: see text] score, [Formula: see text] improvement in micro-AUPRC, and [Formula: see text] improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini’s performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings’ performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION: Gemini can be accessed at: https://github.com/MinxZ/Gemini. Oxford University Press 2023-06-30 /pmc/articles/PMC10311345/ /pubmed/37387142 http://dx.doi.org/10.1093/bioinformatics/btad247 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Systems Biology and Networks
Woicik, Addie
Zhang, Mingxin
Xu, Hanwen
Mostafavi, Sara
Wang, Sheng
Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title_full Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title_fullStr Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title_full_unstemmed Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title_short Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
title_sort gemini: memory-efficient integration of hundreds of gene networks with high-order pooling
topic Systems Biology and Networks
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311345/
https://www.ncbi.nlm.nih.gov/pubmed/37387142
http://dx.doi.org/10.1093/bioinformatics/btad247
work_keys_str_mv AT woicikaddie geminimemoryefficientintegrationofhundredsofgenenetworkswithhighorderpooling
AT zhangmingxin geminimemoryefficientintegrationofhundredsofgenenetworkswithhighorderpooling
AT xuhanwen geminimemoryefficientintegrationofhundredsofgenenetworkswithhighorderpooling
AT mostafavisara geminimemoryefficientintegrationofhundredsofgenenetworkswithhighorderpooling
AT wangsheng geminimemoryefficientintegrationofhundredsofgenenetworkswithhighorderpooling