Cargando…

CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks

To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analyti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Seongsoo, Jeong, Minseop, Han, Hwansoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036346/ https://www.ncbi.nlm.nih.gov/pubmed/33810417 http://dx.doi.org/10.3390/s21072321

_version_	1783676888581931008
author	Park, Seongsoo Jeong, Minseop Han, Hwansoo
author_facet	Park, Seongsoo Jeong, Minseop Han, Hwansoo
author_sort	Park, Seongsoo
collection	PubMed
description	To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analytics frameworks in the cloud is caching intermediate data, which is used repeatedly in iterative computations. Existing analytics engines implement caching with various approaches. Some use run-time mechanisms with dynamic profiling and others rely on programmers to decide data to cache. Even though caching discipline has been investigated long enough in computer system research, recent data analytics frameworks still leave a room to optimize. As sophisticated caching should consider complex execution contexts such as cache capacity, size of data to cache, victims to evict, etc., no general solution often exists for data analytics frameworks. In this paper, we propose an application-specific cost-capacity-aware caching scheme for in-memory data analytics frameworks. We use a cost model, built from multiple representative inputs, and an execution flow analysis, extracted from DAG schedule, to select primary candidates to cache among intermediate data. After the caching candidate is determined, the optimal caching is automatically selected during execution even if the programmers no longer manually determine the caching for the intermediate data. We implemented our scheme in Apache Spark and experimentally evaluated our scheme on HiBench benchmarks. Compared to the caching decisions in the original benchmarks, our scheme increases the performance by 27% on sufficient cache memory and by 11% on insufficient cache memory, respectively.
format	Online Article Text
id	pubmed-8036346
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-80363462021-04-12 CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks Park, Seongsoo Jeong, Minseop Han, Hwansoo Sensors (Basel) Article To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analytics frameworks in the cloud is caching intermediate data, which is used repeatedly in iterative computations. Existing analytics engines implement caching with various approaches. Some use run-time mechanisms with dynamic profiling and others rely on programmers to decide data to cache. Even though caching discipline has been investigated long enough in computer system research, recent data analytics frameworks still leave a room to optimize. As sophisticated caching should consider complex execution contexts such as cache capacity, size of data to cache, victims to evict, etc., no general solution often exists for data analytics frameworks. In this paper, we propose an application-specific cost-capacity-aware caching scheme for in-memory data analytics frameworks. We use a cost model, built from multiple representative inputs, and an execution flow analysis, extracted from DAG schedule, to select primary candidates to cache among intermediate data. After the caching candidate is determined, the optimal caching is automatically selected during execution even if the programmers no longer manually determine the caching for the intermediate data. We implemented our scheme in Apache Spark and experimentally evaluated our scheme on HiBench benchmarks. Compared to the caching decisions in the original benchmarks, our scheme increases the performance by 27% on sufficient cache memory and by 11% on insufficient cache memory, respectively. MDPI 2021-03-26 /pmc/articles/PMC8036346/ /pubmed/33810417 http://dx.doi.org/10.3390/s21072321 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Park, Seongsoo Jeong, Minseop Han, Hwansoo CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title	CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title_full	CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title_fullStr	CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title_full_unstemmed	CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title_short	CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
title_sort	cca: cost-capacity-aware caching for in-memory data analytics frameworks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8036346/ https://www.ncbi.nlm.nih.gov/pubmed/33810417 http://dx.doi.org/10.3390/s21072321
work_keys_str_mv	AT parkseongsoo ccacostcapacityawarecachingforinmemorydataanalyticsframeworks AT jeongminseop ccacostcapacityawarecachingforinmemorydataanalyticsframeworks AT hanhwansoo ccacostcapacityawarecachingforinmemorydataanalyticsframeworks

CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks

Ejemplares similares