Cargando…

Analysis and improvement of data-set level file distribution in Disk Pool Manager

Of the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesn't attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself)....

Descripción completa

Detalles Bibliográficos
Autores principales: Skipsey, Samuel Cadellin, Purdie, Stuart, Britton, David, Mitchell, Mark, Bhimji, Wahid, Smith, David
Lenguaje:eng
Publicado: 2014
Materias:
Acceso en línea:https://dx.doi.org/10.1088/1742-6596/513/4/042042
http://cds.cern.ch/record/2026338
_version_ 1780947350849585152
author Skipsey, Samuel Cadellin
Purdie, Stuart
Britton, David
Mitchell, Mark
Bhimji, Wahid
Smith, David
author_facet Skipsey, Samuel Cadellin
Purdie, Stuart
Britton, David
Mitchell, Mark
Bhimji, Wahid
Smith, David
author_sort Skipsey, Samuel Cadellin
collection CERN
description Of the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesn't attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself). DPM uses a round-robin algorithm (with optional filesystem weighting), for placing files across filesystems and servers. This does a reasonable job of evenly distributing files across the storage array provided to it. However, it does not offer any guarantees of the evenness of distribution of that subset of files associated with a given 'dataset' (which often maps onto a 'directory' in the DPM namespace (DPNS)). It is useful to consider a concept of 'balance', where an optimally balanced set of files indicates that the files are distributed evenly across all of the pool nodes. The best case performance of the round robin algorithm is to maintain balance, it has no mechanism to improve balance. In the past year or more, larger DPM sites have noticed load spikes on individual disk servers, and suspected that these were exacerbated by excesses of files from popular datasets on those servers. We present here a software tool which analyses file distribution for all datasets in a DPM SE, providing a measure of the poorness of file location in this context. Further, the tool provides a list of file movement actions which will improve dataset-level file distribution, and can action those file movements itself. We present results of such an analysis on the UKI-SCOTGRID-GLASGOW Production DPM.
id oai-inspirehep.net-1302117
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2014
record_format invenio
spelling oai-inspirehep.net-13021172022-08-17T13:29:09Zdoi:10.1088/1742-6596/513/4/042042http://cds.cern.ch/record/2026338engSkipsey, Samuel CadellinPurdie, StuartBritton, DavidMitchell, MarkBhimji, WahidSmith, DavidAnalysis and improvement of data-set level file distribution in Disk Pool ManagerComputing and ComputersOf the three most widely used implementations of the WLCG Storage Element specification, Disk Pool Manager[1, 2] (DPM) has the simplest implementation of file placement balancing (StoRM doesn't attempt this, leaving it up to the underlying filesystem, which can be very sophisticated in itself). DPM uses a round-robin algorithm (with optional filesystem weighting), for placing files across filesystems and servers. This does a reasonable job of evenly distributing files across the storage array provided to it. However, it does not offer any guarantees of the evenness of distribution of that subset of files associated with a given 'dataset' (which often maps onto a 'directory' in the DPM namespace (DPNS)). It is useful to consider a concept of 'balance', where an optimally balanced set of files indicates that the files are distributed evenly across all of the pool nodes. The best case performance of the round robin algorithm is to maintain balance, it has no mechanism to improve balance. In the past year or more, larger DPM sites have noticed load spikes on individual disk servers, and suspected that these were exacerbated by excesses of files from popular datasets on those servers. We present here a software tool which analyses file distribution for all datasets in a DPM SE, providing a measure of the poorness of file location in this context. Further, the tool provides a list of file movement actions which will improve dataset-level file distribution, and can action those file movements itself. We present results of such an analysis on the UKI-SCOTGRID-GLASGOW Production DPM.oai:inspirehep.net:13021172014
spellingShingle Computing and Computers
Skipsey, Samuel Cadellin
Purdie, Stuart
Britton, David
Mitchell, Mark
Bhimji, Wahid
Smith, David
Analysis and improvement of data-set level file distribution in Disk Pool Manager
title Analysis and improvement of data-set level file distribution in Disk Pool Manager
title_full Analysis and improvement of data-set level file distribution in Disk Pool Manager
title_fullStr Analysis and improvement of data-set level file distribution in Disk Pool Manager
title_full_unstemmed Analysis and improvement of data-set level file distribution in Disk Pool Manager
title_short Analysis and improvement of data-set level file distribution in Disk Pool Manager
title_sort analysis and improvement of data-set level file distribution in disk pool manager
topic Computing and Computers
url https://dx.doi.org/10.1088/1742-6596/513/4/042042
http://cds.cern.ch/record/2026338
work_keys_str_mv AT skipseysamuelcadellin analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager
AT purdiestuart analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager
AT brittondavid analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager
AT mitchellmark analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager
AT bhimjiwahid analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager
AT smithdavid analysisandimprovementofdatasetlevelfiledistributionindiskpoolmanager