Cargando…
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5888241/ https://www.ncbi.nlm.nih.gov/pubmed/29315405 http://dx.doi.org/10.1093/nar/gkx1313 |
_version_ | 1783312477816094720 |
---|---|
author | Azad, Ariful Pavlopoulos, Georgios A Ouzounis, Christos A Kyrpides, Nikos C Buluç, Aydin |
author_facet | Azad, Ariful Pavlopoulos, Georgios A Ouzounis, Christos A Kyrpides, Nikos C Buluç, Aydin |
author_sort | Azad, Ariful |
collection | PubMed |
description | Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. |
format | Online Article Text |
id | pubmed-5888241 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58882412018-04-11 HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks Azad, Ariful Pavlopoulos, Georgios A Ouzounis, Christos A Kyrpides, Nikos C Buluç, Aydin Nucleic Acids Res Methods Online Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. Oxford University Press 2018-04-06 2018-01-05 /pmc/articles/PMC5888241/ /pubmed/29315405 http://dx.doi.org/10.1093/nar/gkx1313 Text en Published by Oxford University Press on behalf of Nucleic Acids Research 2018. This work is written by (a) US Government employee(s) and is in the public domain in the US. |
spellingShingle | Methods Online Azad, Ariful Pavlopoulos, Georgios A Ouzounis, Christos A Kyrpides, Nikos C Buluç, Aydin HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title | HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title_full | HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title_fullStr | HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title_full_unstemmed | HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title_short | HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks |
title_sort | hipmcl: a high-performance parallel implementation of the markov clustering algorithm for large-scale networks |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5888241/ https://www.ncbi.nlm.nih.gov/pubmed/29315405 http://dx.doi.org/10.1093/nar/gkx1313 |
work_keys_str_mv | AT azadariful hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks AT pavlopoulosgeorgiosa hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks AT ouzounischristosa hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks AT kyrpidesnikosc hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks AT bulucaydin hipmclahighperformanceparallelimplementationofthemarkovclusteringalgorithmforlargescalenetworks |