Cargando…

UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware

With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, gr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Chaeeun, Han, Changhun, Park, Ha-Myung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/ https://www.ncbi.nlm.nih.gov/pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527

_version_	1784841450149642240
author	Kim, Chaeeun Han, Changhun Park, Ha-Myung
author_facet	Kim, Chaeeun Han, Changhun Park, Ha-Myung
author_sort	Kim, Chaeeun
collection	PubMed
description	With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process.
format	Online Article Text
id	pubmed-9710849
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-97108492022-12-01 UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware Kim, Chaeeun Han, Changhun Park, Ha-Myung PLoS One Research Article With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process. Public Library of Science 2022-11-30 /pmc/articles/PMC9710849/ /pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527 Text en © 2022 Kim et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kim, Chaeeun Han, Changhun Park, Ha-Myung UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title	UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_full	UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_fullStr	UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_full_unstemmed	UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_short	UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware
title_sort	unicon: a unified star-operation to efficiently find connected components on a cluster of commodity hardware
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710849/ https://www.ncbi.nlm.nih.gov/pubmed/36449449 http://dx.doi.org/10.1371/journal.pone.0277527
work_keys_str_mv	AT kimchaeeun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware AT hanchanghun uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware AT parkhamyung uniconaunifiedstaroperationtoefficientlyfindconnectedcomponentsonaclusterofcommodityhardware

UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware

Ejemplares similares