Cargando…

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such...

Descripción completa

Detalles Bibliográficos
Autores principales: Azad, Ariful, Pavlopoulos, Georgios A, Ouzounis, Christos A, Kyrpides, Nikos C, Buluç, Aydin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5888241/
https://www.ncbi.nlm.nih.gov/pubmed/29315405
http://dx.doi.org/10.1093/nar/gkx1313
_version_ 1783312477816094720
author Azad, Ariful
Pavlopoulos, Georgios A
Ouzounis, Christos A
Kyrpides, Nikos C
Buluç, Aydin
author_facet Azad, Ariful
Pavlopoulos, Georgios A
Ouzounis, Christos A
Kyrpides, Nikos C
Buluç, Aydin
author_sort Azad, Ariful
collection PubMed
description Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.
format Online
Article
Text
id pubmed-5888241
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58882412018-04-11 HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks Azad, Ariful Pavlopoulos, Georgios A Ouzounis, Christos A Kyrpides, Nikos C Buluç, Aydin Nucleic Acids Res Methods Online Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. Oxford University Press 2018-04-06 2018-01-05 /pmc/articles/PMC5888241/ /pubmed/29315405 http://dx.doi.org/10.1093/nar/gkx1313 Text en Published by Oxford University Press on behalf of Nucleic Acids Research 2018. This work is written by (a) US Government employee(s) and is in the public domain in the US.
spellingShingle Methods Online
Azad, Ariful
Pavlopoulos, Georgios A
Ouzounis, Christos A
Kyrpides, Nikos C
Buluç, Aydin
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title_full HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title_fullStr HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title_full_unstemmed HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title_short HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
title_sort hipmcl: a high-performance parallel implementation of the markov clustering algorithm for large-scale networks
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5888241/
https://www.ncbi.nlm.nih.gov/pubmed/29315405
http://dx.doi.org/10.1093/nar/gkx1313
work_keys_str_mv AT azadariful hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks
AT pavlopoulosgeorgiosa hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks
AT ouzounischristosa hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks
AT kyrpidesnikosc hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks
AT bulucaydin hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks