Cargando…

Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns

Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Shaoming, Li, Yongkai, Xu, Zhengquan, Chong, Yanwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504474/
https://www.ncbi.nlm.nih.gov/pubmed/26181628
http://dx.doi.org/10.1371/journal.pone.0133029
_version_ 1782381464756682752
author Pan, Shaoming
Li, Yongkai
Xu, Zhengquan
Chong, Yanwen
author_facet Pan, Shaoming
Li, Yongkai
Xu, Zhengquan
Chong, Yanwen
author_sort Pan, Shaoming
collection PubMed
description Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10–15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.
format Online
Article
Text
id pubmed-4504474
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45044742015-07-17 Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns Pan, Shaoming Li, Yongkai Xu, Zhengquan Chong, Yanwen PLoS One Research Article Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10–15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance. Public Library of Science 2015-07-16 /pmc/articles/PMC4504474/ /pubmed/26181628 http://dx.doi.org/10.1371/journal.pone.0133029 Text en © 2015 Pan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Pan, Shaoming
Li, Yongkai
Xu, Zhengquan
Chong, Yanwen
Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title_full Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title_fullStr Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title_full_unstemmed Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title_short Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns
title_sort distributed storage algorithm for geospatial image data based on data access patterns
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504474/
https://www.ncbi.nlm.nih.gov/pubmed/26181628
http://dx.doi.org/10.1371/journal.pone.0133029
work_keys_str_mv AT panshaoming distributedstoragealgorithmforgeospatialimagedatabasedondataaccesspatterns
AT liyongkai distributedstoragealgorithmforgeospatialimagedatabasedondataaccesspatterns
AT xuzhengquan distributedstoragealgorithmforgeospatialimagedatabasedondataaccesspatterns
AT chongyanwen distributedstoragealgorithmforgeospatialimagedatabasedondataaccesspatterns