Cargando…

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks

BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between enti...

Descripción completa

Detalles Bibliográficos
Autores principales: Frasca, Marco, Grossi, Giuliano, Gliozzo, Jessica, Mesiti, Marco, Notaro, Marco, Perlasca, Paolo, Petrini, Alessandro, Valentini, Giorgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191976/
https://www.ncbi.nlm.nih.gov/pubmed/30367594
http://dx.doi.org/10.1186/s12859-018-2301-4
_version_ 1783363820785238016
author Frasca, Marco
Grossi, Giuliano
Gliozzo, Jessica
Mesiti, Marco
Notaro, Marco
Perlasca, Paolo
Petrini, Alessandro
Valentini, Giorgio
author_facet Frasca, Marco
Grossi, Giuliano
Gliozzo, Jessica
Mesiti, Marco
Notaro, Marco
Perlasca, Paolo
Petrini, Alessandro
Valentini, Giorgio
author_sort Frasca, Marco
collection PubMed
description BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.
format Online
Article
Text
id pubmed-6191976
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61919762018-10-23 A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks Frasca, Marco Grossi, Giuliano Gliozzo, Jessica Mesiti, Marco Notaro, Marco Perlasca, Paolo Petrini, Alessandro Valentini, Giorgio BMC Bioinformatics Research BACKGROUND: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. RESULTS: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. CONCLUSIONS: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes. BioMed Central 2018-10-15 /pmc/articles/PMC6191976/ /pubmed/30367594 http://dx.doi.org/10.1186/s12859-018-2301-4 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Frasca, Marco
Grossi, Giuliano
Gliozzo, Jessica
Mesiti, Marco
Notaro, Marco
Perlasca, Paolo
Petrini, Alessandro
Valentini, Giorgio
A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title_full A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title_fullStr A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title_full_unstemmed A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title_short A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks
title_sort gpu-based algorithm for fast node label learning in large and unbalanced biomolecular networks
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191976/
https://www.ncbi.nlm.nih.gov/pubmed/30367594
http://dx.doi.org/10.1186/s12859-018-2301-4
work_keys_str_mv AT frascamarco agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT grossigiuliano agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT gliozzojessica agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT mesitimarco agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT notaromarco agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT perlascapaolo agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT petrinialessandro agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT valentinigiorgio agpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT frascamarco gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT grossigiuliano gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT gliozzojessica gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT mesitimarco gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT notaromarco gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT perlascapaolo gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT petrinialessandro gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks
AT valentinigiorgio gpubasedalgorithmforfastnodelabellearninginlargeandunbalancedbiomolecularnetworks