Cargando…

UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware

With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, gr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Chaeeun, Han, Changhun, Park, Ha-Myung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/
https://www.ncbi.nlm.nih.gov/pubmed/36449449
http://dx.doi.org/10.1371/journal.pone.0277527
_version_ 1784841450149642240
author Kim, Chaeeun
Han, Changhun
Park, Ha-Myung
author_facet Kim, Chaeeun
Han, Changhun
Park, Ha-Myung
author_sort Kim, Chaeeun
collection PubMed
description With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process.
format Online
Article
Text
id pubmed-9710849
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-97108492022-12-01 UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware Kim, Chaeeun Han, Changhun Park, Ha-Myung PLoS One Research Article With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process. Public Library of Science 2022-11-30 /pmc/articles/PMC9710849/ /pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527 Text en © 2022 Kim et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Chaeeun
Han, Changhun
Park, Ha-Myung
UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_full UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_fullStr UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_full_unstemmed UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_short UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_sort unicon: a unified star-operation to efficiently find connected components on a cluster of commodity hardware
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/
https://www.ncbi.nlm.nih.gov/pubmed/36449449
http://dx.doi.org/10.1371/journal.pone.0277527
work_keys_str_mv AT kimchaeeun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware
AT hanchanghun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware
AT parkhamyung uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware