Cargando…

FlexGraph: Flexible partitioning and storage for scalable graph mining

How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Chiwan, Park, Ha-Myung, Kang, U.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980573/
https://www.ncbi.nlm.nih.gov/pubmed/31978075
http://dx.doi.org/10.1371/journal.pone.0227032
_version_ 1783490967914938368
author Park, Chiwan
Park, Ha-Myung
Kang, U.
author_facet Park, Chiwan
Park, Ha-Myung
Kang, U.
author_sort Park, Chiwan
collection PubMed
description How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to massive communication and I/O costs caused by communication between workers, and reading subgraphs repeatedly. In this paper, we propose FlexGraph, a scalable distributed graph mining method reducing the costs by exploiting properties of real-world graphs. FlexGraph significantly decreases the communication cost, which is the main bottleneck of distributed systems, by exploiting different edge placement policies based on types of vertices. Furthermore, we propose a flexible storage format to reduce I/O costs when reading input graph repeatedly. Experiments show that FlexGraph succeeds in processing up to 64× larger graphs than existing distributed memory-based graph mining methods, and consistently outperforms previous disk-based graph mining methods.
format Online
Article
Text
id pubmed-6980573
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69805732020-02-04 FlexGraph: Flexible partitioning and storage for scalable graph mining Park, Chiwan Park, Ha-Myung Kang, U. PLoS One Research Article How can we analyze large graphs such as the Web, and social networks with hundreds of billions of vertices and edges? Although many graph mining systems have been proposed to perform various graph mining algorithms on such large graphs, they have difficulties in processing Web-scale graphs due to massive communication and I/O costs caused by communication between workers, and reading subgraphs repeatedly. In this paper, we propose FlexGraph, a scalable distributed graph mining method reducing the costs by exploiting properties of real-world graphs. FlexGraph significantly decreases the communication cost, which is the main bottleneck of distributed systems, by exploiting different edge placement policies based on types of vertices. Furthermore, we propose a flexible storage format to reduce I/O costs when reading input graph repeatedly. Experiments show that FlexGraph succeeds in processing up to 64× larger graphs than existing distributed memory-based graph mining methods, and consistently outperforms previous disk-based graph mining methods. Public Library of Science 2020-01-24 /pmc/articles/PMC6980573/ /pubmed/31978075 http://dx.doi.org/10.1371/journal.pone.0227032 Text en © 2020 Park et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Park, Chiwan
Park, Ha-Myung
Kang, U.
FlexGraph: Flexible partitioning and storage for scalable graph mining
title FlexGraph: Flexible partitioning and storage for scalable graph mining
title_full FlexGraph: Flexible partitioning and storage for scalable graph mining
title_fullStr FlexGraph: Flexible partitioning and storage for scalable graph mining
title_full_unstemmed FlexGraph: Flexible partitioning and storage for scalable graph mining
title_short FlexGraph: Flexible partitioning and storage for scalable graph mining
title_sort flexgraph: flexible partitioning and storage for scalable graph mining
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980573/
https://www.ncbi.nlm.nih.gov/pubmed/31978075
http://dx.doi.org/10.1371/journal.pone.0227032
work_keys_str_mv AT parkchiwan flexgraphflexiblepartitioningandstorageforscalablegraphmining
AT parkhamyung flexgraphflexiblepartitioningandstorageforscalablegraphmining
AT kangu flexgraphflexiblepartitioningandstorageforscalablegraphmining