Cargando…
Predicting data popularity using Bayesian networks over ATLAS grid sites
One of the primary tasks in resource utilization of the Distributed Computing at the ATLAS experiment is to replicate newly obtained data from the Production and Distributed Analysis System (PanDA) over grid while minimizing the number of data replicas, but on the other hand if a dataset becomes pop...
Autores principales: | , , , |
---|---|
Lenguaje: | eng |
Publicado: |
2012
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1462239 |
Sumario: | One of the primary tasks in resource utilization of the Distributed Computing at the ATLAS experiment is to replicate newly obtained data from the Production and Distributed Analysis System (PanDA) over grid while minimizing the number of data replicas, but on the other hand if a dataset becomes popular, such replicas should be encouraged to distribute the workload more evenly. To make this feasible it is of significant importance to know, with a good probability, how popular particular datasets will be in future. We are focusing on the analysis of data usage in PanDA system that provides efficient and transparent utilization of the grid for production and analysis tasks. The initial data popularity analysis was done at “A Probabilistic Analysis of Data Popularity in ATLAS Data Caching”, and an idea of Bayesian networks (a high-level representation of a probability distribution over a set of stochastic variables that are used for building a model of the problem domain) for popularity prediction has surfaced with corresponding conditioning parameters that were defined based on the data usage analysis. |
---|