Cargando…
UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, gr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/ https://www.ncbi.nlm.nih.gov/pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527 |
_version_ | 1784841450149642240 |
---|---|
author | Kim, Chaeeun Han, Changhun Park, Ha-Myung |
author_facet | Kim, Chaeeun Han, Changhun Park, Ha-Myung |
author_sort | Kim, Chaeeun |
collection | PubMed |
description | With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process. |
format | Online Article Text |
id | pubmed-9710849 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-97108492022-12-01 UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware Kim, Chaeeun Han, Changhun Park, Ha-Myung PLoS One Research Article With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process. Public Library of Science 2022-11-30 /pmc/articles/PMC9710849/ /pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527 Text en © 2022 Kim et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Kim, Chaeeun Han, Changhun Park, Ha-Myung UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title | UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title_full | UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title_fullStr | UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title_full_unstemmed | UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title_short | UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware |
title_sort | unicon: a unified star-operation to efficiently find connected components on a cluster of commodity hardware |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/ https://www.ncbi.nlm.nih.gov/pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527 |
work_keys_str_mv | AT kimchaeeun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware AT hanchanghun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware AT parkhamyung uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware |