Cargando…

DOMe: A deduplication optimization method for the NewSQL database backups

Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Longxiang, Zhu, Zhengdong, Zhang, Xingjun, Dong, Xiaoshe, Wang, Yinfeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5648134/ https://www.ncbi.nlm.nih.gov/pubmed/29049307 http://dx.doi.org/10.1371/journal.pone.0185189

_version_	1783272343105175552
author	Wang, Longxiang Zhu, Zhengdong Zhang, Xingjun Dong, Xiaoshe Wang, Yinfeng
author_facet	Wang, Longxiang Zhu, Zhengdong Zhang, Xingjun Dong, Xiaoshe Wang, Yinfeng
author_sort	Wang, Longxiang
collection	PubMed
description	Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method.
format	Online Article Text
id	pubmed-5648134
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-56481342017-11-03 DOMe: A deduplication optimization method for the NewSQL database backups Wang, Longxiang Zhu, Zhengdong Zhang, Xingjun Dong, Xiaoshe Wang, Yinfeng PLoS One Research Article Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method. Public Library of Science 2017-10-19 /pmc/articles/PMC5648134/ /pubmed/29049307 http://dx.doi.org/10.1371/journal.pone.0185189 Text en © 2017 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Wang, Longxiang Zhu, Zhengdong Zhang, Xingjun Dong, Xiaoshe Wang, Yinfeng DOMe: A deduplication optimization method for the NewSQL database backups
title	DOMe: A deduplication optimization method for the NewSQL database backups
title_full	DOMe: A deduplication optimization method for the NewSQL database backups
title_fullStr	DOMe: A deduplication optimization method for the NewSQL database backups
title_full_unstemmed	DOMe: A deduplication optimization method for the NewSQL database backups
title_short	DOMe: A deduplication optimization method for the NewSQL database backups
title_sort	dome: a deduplication optimization method for the newsql database backups
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5648134/ https://www.ncbi.nlm.nih.gov/pubmed/29049307 http://dx.doi.org/10.1371/journal.pone.0185189
work_keys_str_mv	AT wanglongxiang domeadeduplicationoptimizationmethodforthenewsqldatabasebackups AT zhuzhengdong domeadeduplicationoptimizationmethodforthenewsqldatabasebackups AT zhangxingjun domeadeduplicationoptimizationmethodforthenewsqldatabasebackups AT dongxiaoshe domeadeduplicationoptimizationmethodforthenewsqldatabasebackups AT wangyinfeng domeadeduplicationoptimizationmethodforthenewsqldatabasebackups

DOMe: A deduplication optimization method for the NewSQL database backups

Ejemplares similares